pflm.fpca.FunctionalDataGenerator#
- class FunctionalDataGenerator(t: ~numpy.ndarray, mean_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray], var_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray], corr_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray] = <ufunc 'j0'>, variation_prop_thresh: float = 0.999999, num_pcs: int | None = None, error_var: float = 1.0)[source][source]#
Bases:
objectGenerator for synthetic functional data on a fixed grid.
This class builds a stationary covariance surface from a marginal variance function and a correlation kernel, performs an FPCA on the implied covariance, and samples low-rank functional signals with optional Gaussian measurement noise.
- Parameters:
- tnp.ndarray of shape (nt,)
Monotonic grid of time points.
- mean_funcCallable[[np.ndarray], np.ndarray]
Mean function evaluated on t, returns shape (nt,).
- var_funcCallable[[np.ndarray], np.ndarray]
Marginal variance function evaluated on t, returns shape (nt,).
- corr_funcCallable[[np.ndarray], np.ndarray], default=scipy.special.j0
Correlation kernel k(h) used to build the covariance surface, where h is the absolute time lag.
- variation_prop_threshfloat, default=0.999999
Threshold of fraction of variance explained (FVE) to choose the number of components if num_pcs is None. Must satisfy 0 < thresh < 1.
- num_pcsint or None, default=None
Number of principal components to retain. If None, it is determined by the FVE threshold.
- error_varfloat, default=1.0
Gaussian noise variance added to generated curves.
- Attributes:
- tnp.ndarray of shape (nt,)
Copy of the input grid.
- mean_funcCallable[[np.ndarray], np.ndarray]
Mean function handle used during generation.
- var_funcCallable[[np.ndarray], np.ndarray]
Marginal variance function handle used during generation.
- corr_funcCallable[[np.ndarray], np.ndarray]
Correlation kernel used to build the covariance surface.
- variation_prop_threshfloat
FVE threshold used when num_pcs is not specified.
- error_varfloat
Measurement noise variance for generation.
Notes
The covariance is constructed as \(sqrt(var_func(t_i)) * corr(|t_i - t_j|) * sqrt(var_func(t_j))\).
FPCA components (eigenstructure) are computed lazily on first use.
Private caches: - _num_pcs: Optional[int] - _fpca_phi: Optional[np.ndarray of shape (nt, k)]
Examples
>>> import numpy as np >>> from pflm.fpca import FunctionalDataGenerator >>> t = np.linspace(0.0, 10.0, 51) >>> gen = FunctionalDataGenerator( ... t, lambda x: np.sin(x) * 0.5, lambda x: 1.0 + 0.2 * np.cos(x), ... ) >>> y_list, t_list = gen.generate(n=20, seed=42) >>> len(y_list) 20 >>> y_list[0].shape (51,)
- generate(n: int, seed: int | None = None) ndarray[source][source]#
Generate functional data samples.
- Parameters:
- nint
Number of functional samples to generate (typically n > 0).
- seedint, optional
Random seed for reproducibility.
- Returns:
- yList[np.ndarray]
List of length n; each element has shape (nt,) and represents one sample.
- tList[np.ndarray]
List of length n; each element is the time grid of shape (nt,).
Notes
Scores are drawn from N(0, diag(lambda)) implicitly via an identity covariance in score space and rescaled by the FPCA basis and variance.
Gaussian noise with variance error_var is added independently per point.
- get_fpca_phi() ndarray[source][source]#
Return the FPCA basis functions evaluated on t.
- Returns:
- fpca_phinp.ndarray of shape (nt, k)
The functional principal component basis functions.
Notes
The FPCA basis is computed lazily on the first call and cached.
- get_num_pcs() int[source][source]#
Return the number of retained principal components.
- Returns:
- num_pcsint
Effective number of retained FPCA components.
Notes
If not set explicitly, the value is determined by the FVE threshold on first access and then cached.
- static make_missing(y: list[ndarray], t: list[ndarray], missing_number: int, seed: int | None = None) tuple[list[ndarray], list[ndarray]][source][source]#
Introduce missing values into each functional sample.
- Parameters:
- yList[np.ndarray]
Functional samples; each array has shape (nt_i,).
- tList[np.ndarray]
Time grids corresponding to y; each array has shape (nt_i,).
- missing_numberint
Number of indices to drop per sample. Must satisfy 1 <= m < nt_i.
- seedint, optional
Random seed for reproducibility.
- Returns:
- new_yList[np.ndarray]
Samples with missing entries removed.
- new_tList[np.ndarray]
Corresponding time points with the same indices removed.
- Raises:
- ValueError
If missing_number is not in [1, len(y[0]) - 1] or if input y already contains NaN values.