pflm.fpca.FunctionalDataGenerator#

class FunctionalDataGenerator(t: ~numpy.ndarray, mean_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray], var_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray], corr_func: ~collections.abc.Callable[[~numpy.ndarray], ~numpy.ndarray] = <ufunc 'j0'>, variation_prop_thresh: float = 0.999999, num_pcs: int | None = None, error_var: float = 1.0)[source][source]#

Bases: object

Generator for synthetic functional data on a fixed grid.

This class builds a stationary covariance surface from a marginal variance function and a correlation kernel, performs an FPCA on the implied covariance, and samples low-rank functional signals with optional Gaussian measurement noise.

Parameters:
tnp.ndarray of shape (nt,)

Monotonic grid of time points.

mean_funcCallable[[np.ndarray], np.ndarray]

Mean function evaluated on t, returns shape (nt,).

var_funcCallable[[np.ndarray], np.ndarray]

Marginal variance function evaluated on t, returns shape (nt,).

corr_funcCallable[[np.ndarray], np.ndarray], default=scipy.special.j0

Correlation kernel k(h) used to build the covariance surface, where h is the absolute time lag.

variation_prop_threshfloat, default=0.999999

Threshold of fraction of variance explained (FVE) to choose the number of components if num_pcs is None. Must satisfy 0 < thresh < 1.

num_pcsint or None, default=None

Number of principal components to retain. If None, it is determined by the FVE threshold.

error_varfloat, default=1.0

Gaussian noise variance added to generated curves.

Attributes:
tnp.ndarray of shape (nt,)

Copy of the input grid.

mean_funcCallable[[np.ndarray], np.ndarray]

Mean function handle used during generation.

var_funcCallable[[np.ndarray], np.ndarray]

Marginal variance function handle used during generation.

corr_funcCallable[[np.ndarray], np.ndarray]

Correlation kernel used to build the covariance surface.

variation_prop_threshfloat

FVE threshold used when num_pcs is not specified.

error_varfloat

Measurement noise variance for generation.

Notes

  • The covariance is constructed as \(sqrt(var_func(t_i)) * corr(|t_i - t_j|) * sqrt(var_func(t_j))\).

  • FPCA components (eigenstructure) are computed lazily on first use.

  • Private caches: - _num_pcs: Optional[int] - _fpca_phi: Optional[np.ndarray of shape (nt, k)]

Examples

>>> import numpy as np
>>> from pflm.fpca import FunctionalDataGenerator
>>> t = np.linspace(0.0, 10.0, 51)
>>> gen = FunctionalDataGenerator(
...     t, lambda x: np.sin(x) * 0.5, lambda x: 1.0 + 0.2 * np.cos(x),
... )
>>> y_list, t_list = gen.generate(n=20, seed=42)
>>> len(y_list)
20
>>> y_list[0].shape
(51,)
generate(n: int, seed: int | None = None) ndarray[source][source]#

Generate functional data samples.

Parameters:
nint

Number of functional samples to generate (typically n > 0).

seedint, optional

Random seed for reproducibility.

Returns:
yList[np.ndarray]

List of length n; each element has shape (nt,) and represents one sample.

tList[np.ndarray]

List of length n; each element is the time grid of shape (nt,).

Notes

  • Scores are drawn from N(0, diag(lambda)) implicitly via an identity covariance in score space and rescaled by the FPCA basis and variance.

  • Gaussian noise with variance error_var is added independently per point.

get_fpca_phi() ndarray[source][source]#

Return the FPCA basis functions evaluated on t.

Returns:
fpca_phinp.ndarray of shape (nt, k)

The functional principal component basis functions.

Notes

The FPCA basis is computed lazily on the first call and cached.

get_num_pcs() int[source][source]#

Return the number of retained principal components.

Returns:
num_pcsint

Effective number of retained FPCA components.

Notes

If not set explicitly, the value is determined by the FVE threshold on first access and then cached.

static make_missing(y: list[ndarray], t: list[ndarray], missing_number: int, seed: int | None = None) tuple[list[ndarray], list[ndarray]][source][source]#

Introduce missing values into each functional sample.

Parameters:
yList[np.ndarray]

Functional samples; each array has shape (nt_i,).

tList[np.ndarray]

Time grids corresponding to y; each array has shape (nt_i,).

missing_numberint

Number of indices to drop per sample. Must satisfy 1 <= m < nt_i.

seedint, optional

Random seed for reproducibility.

Returns:
new_yList[np.ndarray]

Samples with missing entries removed.

new_tList[np.ndarray]

Corresponding time points with the same indices removed.

Raises:
ValueError

If missing_number is not in [1, len(y[0]) - 1] or if input y already contains NaN values.