ddl.tree.TreeDensity

class ddl.tree.TreeDensity(tree_estimator=None, get_tree=None, node_destructor=None, uniform_weight=1e-06)[source]

Bases: sklearn.base.BaseEstimator, ddl.base.ScoreMixin

Tree density estimator defined on the unit hypercube.

This density estimator first estimates the tree structure via the tree_estimator parameter. Then the estimator constructs a density tree by counting the number of training data points that fall into the leaves. The empirical counts are regularized by the uniform_weight which regularizes the tree towards the uniform density—essentially a mixture between the empirical tree and a uniform density. Optionally, a node destructor can be specified to be estimated and applied at each leaf node.

Parameters:
tree_estimator : estimator, defaults to RandomTreeEstimator

Tree estimator defaults to RandomTreeEstimator but other estimators could be used or developed such as the mlpack density estimation tree (DET) estimator ddl.externals.mlpack.MlpackDensityTreeEstimator.

get_tree : func

Function that extracts the tree structure from the fitted tree estimator: tree = get_tree(fitted_tree_estimator). Default is to extract an sklearn.tree arrayed tree from the estimator such as from the estimator sklearn.tree.ExtraTreeRegressor.

node_destructor : estimator, optional

Optional destructor that can be fitted and applied at each leaf node of the tree. For example, this could be an independent histogram density (via IndependentDestructor with HistogramUnivariateDensity densities). With a node destructor, the tree destructor is no longer piecewise uniform but rather a more general piecewise density.

uniform_weight : float, between 0 and 1

The mixture weight of a uniform density used to regularize the empirical tree density. For example, if uniform_weight=1, the density estimate trivially reduces to the uniform density. On the other hand, if uniform_weight=0, no regularization is performed on the empirical density estimate. Anything in between 0 and 1 regularizes the density partially.

Methods

fit(self, X[, y, fitted_tree_estimator]) Fit estimator to X.
get_params(self[, deep]) Get parameters for this estimator.
get_support(self) Get the support of this density (i.e.
sample(self[, n_samples, random_state, shuffle]) [Placeholder].
score(self, X[, y]) Return the mean log likelihood (or log(det(Jacobian))).
score_samples(self, X[, y]) Compute log-likelihood (or log(det(Jacobian))) for each sample.
set_params(self, \*\*params) Set the parameters of this estimator.
create_fitted  
__init__(self, tree_estimator=None, get_tree=None, node_destructor=None, uniform_weight=1e-06)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod create_fitted(tree, n_features, **kwargs)[source]
fit(self, X, y=None, fitted_tree_estimator=None)[source]

Fit estimator to X.

Parameters:
X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

y : None, default=None

Not used in the fitting process but kept for compatibility.

fitted_tree_estimator : estimator

[Placeholder].

Returns:
self : estimator

Returns the instance itself.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_support(self)[source]

Get the support of this density (i.e. the positive density region).

Returns:
support : array-like, shape (2,) or shape (n_features, 2)

If shape is (2, ), then support[0] is the minimum and support[1] is the maximum for all features. If shape is (n_features, 2), then each feature’s support (which could be different for each feature) is given similar to the first case.

sample(self, n_samples=1, random_state=None, shuffle=True)[source]

[Placeholder].

Parameters:
n_samples :
random_state :
shuffle :
Returns:
obj : object
score(self, X, y=None)

Return the mean log likelihood (or log(det(Jacobian))).

Parameters:
X : array-like, shape (n_samples, n_features)

New data, where n_samples is the number of samples and n_features is the number of features.

y : None, default=None

Not used but kept for compatibility.

Returns:
log_likelihood : float

Mean log likelihood data points in X.

score_samples(self, X, y=None)[source]

Compute log-likelihood (or log(det(Jacobian))) for each sample.

Parameters:
X : array-like, shape (n_samples, n_features)

New data, where n_samples is the number of samples and n_features is the number of features.

y : None, default=None

Not used but kept for compatibility.

Returns:
log_likelihood : array, shape (n_samples,)

Log likelihood of each data point in X.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self