ddl.mixture.FirstFixedGaussianMixtureDensity

class ddl.mixture.FirstFixedGaussianMixtureDensity(fixed_weight=0.5, n_components=1, covariance_type='full')[source]

Bases: ddl.mixture.GaussianMixtureDensity

Mixture density where one component is fixed as the standard normal.

This is useful for creating a regularized Gaussian mixture destructor. In particular, if this is paired with an inverse Gaussian cdf (i.e. IndependentInverseCdf), and the weight of the fixed standard normal approaches 1, then the composite destructor approaches an identity. Thus, the fixed_weight parameter can be used to control the amount of regularization.

Note this is implemented by overriding the _m_step private method of sklearn.mixture.GaussianMixture so it may not be compatible with future releases of sklearn.

More specifically first n_components-1 Gaussian components are fit using the standard Gaussian mixture estimator. Then, we manually add a fixed standard normal component with the desired fixed weight. Then, we refit but override the M step so that the fixed weight component does not change.

Parameters:
fixed_weight : float, default=0.5

The fixed weight between 0 and 1 that is given to the first Gaussian component. As this weight approaches 1, there is full regularization and no learning from the data. If the weight approaches 0, then there is no regularization and the fitting is determined entirely from the data.

n_components : int, default=1

The number of mixture components to fit.

covariance_type : {‘full’, ‘tied’, ‘diag’, ‘spherical’}, default=’full’

String describing the type of covariance parameters to use. Must be one of:

'full' (each component has its own general covariance matrix),
'tied' (all components share the same general covariance matrix),
'diag' (each component has its own diagonal covariance matrix),
'spherical' (each component has its own single variance).

Methods

aic(self, X) Akaike information criterion for the current model on the input X.
bic(self, X) Bayesian information criterion for the current model on the input X.
conditional_densities(self, X, cond_idx, …) Compute conditional densities.
fit(self, X[, y]) Fit estimator to X.
fit_predict(self, X[, y]) Estimate model parameters using X and predict the labels for X.
get_params(self[, deep]) Get parameters for this estimator.
marginal_cdf(self, x, target_idx) Return the marginal cdf of x at feature target_idx.
marginal_inverse_cdf(self, x, target_idx) Return the marginal inverse cdf of x at feature target_idx.
predict(self, X) Predict the labels for the data samples in X using trained model.
predict_proba(self, X) Predict posterior probability of each component given the data.
sample(self[, n_samples, random_state]) Sample from GaussianMixture and return only X instead of (X, y).
score(self, X[, y]) Compute the per-sample average log-likelihood of the given data X.
score_samples(self, X) Compute the weighted log probabilities for each sample.
set_params(self, \*\*params) Set the parameters of this estimator.
create_fitted  
sample_joint  
__init__(self, fixed_weight=0.5, n_components=1, covariance_type='full')[source]

Initialize self. See help(type(self)) for accurate signature.

aic(self, X)

Akaike information criterion for the current model on the input X.

Parameters:
X : array of shape (n_samples, n_dimensions)
Returns:
aic : float

The lower the better.

bic(self, X)

Bayesian information criterion for the current model on the input X.

Parameters:
X : array of shape (n_samples, n_dimensions)
Returns:
bic : float

The lower the better.

conditional_densities(self, X, cond_idx, not_cond_idx)

Compute conditional densities.

Parameters:
X : array-like, shape (n_samples, n_features)

Data to condition on based on cond_idx.

cond_idx : array-like of int

Indices to condition on.

not_cond_idx :

Indices not to condition on.

Returns:
conditional_densities : array-like of estimators

Either a single density if all the same or a list of Gaussian densities with conditional variances and means.

classmethod create_fitted(*args, **kwargs)
fit(self, X, y=None)[source]

Fit estimator to X.

Parameters:
X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

y : None, default=None

Not used in the fitting process but kept for compatibility.

Returns:
self : estimator

Returns the instance itself.

fit_predict(self, X, y=None)

Estimate model parameters using X and predict the labels for X.

The method fits the model n_init times and sets the parameters with which the model has the largest likelihood or lower bound. Within each trial, the method iterates between E-step and M-step for max_iter times until the change of likelihood or lower bound is less than tol, otherwise, a ConvergenceWarning is raised. After fitting, it predicts the most probable label for the input data points.

New in version 0.20.

Parameters:
X : array-like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:
labels : array, shape (n_samples,)

Component labels.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

marginal_cdf(self, x, target_idx)

Return the marginal cdf of x at feature target_idx.

marginal_inverse_cdf(self, x, target_idx)

Return the marginal inverse cdf of x at feature target_idx.

predict(self, X)

Predict the labels for the data samples in X using trained model.

Parameters:
X : array-like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:
labels : array, shape (n_samples,)

Component labels.

predict_proba(self, X)

Predict posterior probability of each component given the data.

Parameters:
X : array-like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:
resp : array, shape (n_samples, n_components)

Returns the probability each Gaussian (state) in the model given each sample.

sample(self, n_samples=1, random_state=None)

Sample from GaussianMixture and return only X instead of (X, y).

sample_joint(self, n_samples=1, random_state=None)
score(self, X, y=None)

Compute the per-sample average log-likelihood of the given data X.

Parameters:
X : array-like, shape (n_samples, n_dimensions)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:
log_likelihood : float

Log likelihood of the Gaussian mixture given X.

score_samples(self, X)

Compute the weighted log probabilities for each sample.

Parameters:
X : array-like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:
log_prob : array, shape (n_samples,)

Log probabilities of each data point in X.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self