pflm.fpca.FunctionalDataGenerator#

class FunctionalDataGenerator(t: ~numpy.ndarray, mean_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray], var_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray], corr_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray] = <ufunc 'j0'>, variation_prop_thresh: float = 0.999999, num_pcs: int | None = None, error_var: float = 1.0)[source][source]#

Bases: object

Generator for synthetic functional data on a fixed grid.

This class builds a stationary covariance surface from a marginal variance function and a correlation kernel, performs an FPCA on the implied covariance, and samples low-rank functional signals with optional Gaussian measurement noise.

Parameters:

tnp.ndarray of shape (nt,): Monotonic grid of time points.
mean_funcCallable[[np.ndarray], np.ndarray]: Mean function evaluated on t, returns shape (nt,).
var_funcCallable[[np.ndarray], np.ndarray]: Marginal variance function evaluated on t, returns shape (nt,).
corr_funcCallable[[np.ndarray], np.ndarray], default=scipy.special.j0: Correlation kernel k(h) used to build the covariance surface, where h is the absolute time lag.
variation_prop_threshfloat, default=0.999999: Threshold of fraction of variance explained (FVE) to choose the number of components if num_pcs is None. Must satisfy 0 < thresh < 1.
num_pcsint or None, default=None: Number of principal components to retain. If None, it is determined by the FVE threshold.
error_varfloat, default=1.0: Gaussian noise variance added to generated curves.

Attributes:

tnp.ndarray of shape (nt,): Copy of the input grid.
mean_funcCallable[[np.ndarray], np.ndarray]: Mean function handle used during generation.
var_funcCallable[[np.ndarray], np.ndarray]: Marginal variance function handle used during generation.
corr_funcCallable[[np.ndarray], np.ndarray]: Correlation kernel used to build the covariance surface.
variation_prop_threshfloat: FVE threshold used when num_pcs is not specified.
error_varfloat: Measurement noise variance for generation.

Notes

The covariance is constructed as \(sqrt(var_func(t_i)) * corr(|t_i - t_j|) * sqrt(var_func(t_j))\).
FPCA components (eigenstructure) are computed lazily on first use.
Private caches: - _num_pcs: Optional[int] - _fpca_phi: Optional[np.ndarray of shape (nt, k)]

Examples

>>> import numpy as np
>>> from pflm.fpca import FunctionalDataGenerator
>>> t = np.linspace(0.0, 10.0, 51)
>>> gen = FunctionalDataGenerator(
...     t, lambda x: np.sin(x) * 0.5, lambda x: 1.0 + 0.2 * np.cos(x),
... )
>>> y_list, t_list = gen.generate(n=20, seed=42)
>>> len(y_list)
20
>>> y_list[0].shape
(51,)

generate(n: int, seed: int | None = None) → ndarray[source][source]#

Generate functional data samples.

Parameters:

nint: Number of functional samples to generate (typically n > 0).
seedint, optional: Random seed for reproducibility.

Returns:

yList[np.ndarray]: List of length n; each element has shape (nt,) and represents one sample.
tList[np.ndarray]: List of length n; each element is the time grid of shape (nt,).

Notes

Scores are drawn from N(0, diag(lambda)) implicitly via an identity covariance in score space and rescaled by the FPCA basis and variance.
Gaussian noise with variance error_var is added independently per point.

get_fpca_phi() → ndarray[source][source]#

Return the FPCA basis functions evaluated on t.

Returns:

fpca_phinp.ndarray of shape (nt, k): The functional principal component basis functions.

Notes

The FPCA basis is computed lazily on the first call and cached.

get_num_pcs() → int[source][source]#

Return the number of retained principal components.

Returns:

num_pcsint: Effective number of retained FPCA components.

Notes

If not set explicitly, the value is determined by the FVE threshold on first access and then cached.

static make_missing(y: list[ndarray], t: list[ndarray], missing_number: int, seed: int | None = None) → tuple[list[ndarray], list[ndarray]][source][source]#

Introduce missing values into each functional sample.

Parameters:

yList[np.ndarray]: Functional samples; each array has shape (nt_i,).
tList[np.ndarray]: Time grids corresponding to y; each array has shape (nt_i,).
missing_numberint: Number of indices to drop per sample. Must satisfy 1 <= m < nt_i.
seedint, optional: Random seed for reproducibility.

Returns:

new_yList[np.ndarray]: Samples with missing entries removed.
new_tList[np.ndarray]: Corresponding time points with the same indices removed.

Raises:

ValueError: If missing_number is not in [1, len(y[0]) - 1] or if input y already contains NaN values.