Mlearner.preprocessing

mlearner version: 0.2.0

CategoricalEncoder

CategoricalEncoder(encoding='onehot', categories='auto', dtype=, handle_unknown='error')

Encode categorical features as a numeric array. The input to this transformer should be a matrix of integers or strings, denoting the values taken on by categorical (discrete) features. The features can be encoded using a one-hot aka one-of-K scheme (encoding='onehot', the default) or converted to ordinal integers (encoding='ordinal'). This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels. Read more in the :ref:User Guide <preprocessing_categorical_features>.

Parameters

encoding : str, 'onehot', 'onehot-dense' or 'ordinal'

The type of encoding to use (default is 'onehot'): - 'onehot': encode the features using a one-hot aka one-of-K scheme (or also called 'dummy' encoding). This creates a binary column for each category and returns a sparse matrix. - 'onehot-dense': the same as 'onehot' but returns a dense array instead of a sparse matrix. - 'ordinal': encode the features as ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature.
categories : 'auto' or a list of lists/arrays of values.

Categories (unique values) per feature:
- 'auto' : Determine categories automatically from the training data.
- list : categories[i] holds the categories expected in the ith

column. The passed categories are sorted before encoding the data (used categories can be found in the categories_ attribute).
dtype : number type, default np.float64

Desired dtype of output.
handle_unknown : 'error' (default) or 'ignore'

Whether to raise an error or ignore if a unknown categorical feature is present during transform (default is to raise). When this is parameter is set to 'ignore' and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. Ignoring unknown categories is not supported for encoding='ordinal'.

Attributes

categories_ : list of arrays

The categories of each feature determined during fitting. When categories were specified manually, this holds the sorted categories (in order corresponding with output of transform).

Examples

Given a dataset with three features and two samples, we let the encoder find the maximum value per feature and transform the data to a binary one-hot encoding. >>> from sklearn.preprocessing import CategoricalEncoder >>> enc = CategoricalEncoder(handle_unknown='ignore') >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) ... # doctest: +ELLIPSIS CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>, encoding='onehot', handle_unknown='ignore') >>> enc.transform([[0, 1, 1], [1, 0, 4]]).toarray() array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.], [ 0., 1., 1., 0., 0., 0., 0., 0., 0.]])

See also

sklearn.preprocessing.OneHotEncoder : performs a one-hot encoding of

integer ordinal features. The OneHotEncoder assumes that input features take on values in the range [0, max(feature)] instead of using the unique values.
sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of

dictionary items (also handles string-valued features).
sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot

encoding of dictionary items or strings.

Methods

fit(X, y=None)

Fit the CategoricalEncoder to X. Parameters

X : array-like, shape [n_samples, n_feature]

The data to determine the categories of each feature. Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Transform X using one-hot encoding.

Parameters

X : array-like, shape [n_samples, n_features]

The data to encode.

Returns

X_out : sparse matrix or a 2-d array

Transformed input.

ClassTransformer_value

ClassTransformer_value(columns, name='A/AH_cat', value=100)

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y=None)

None

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

None

CopyFeatures

CopyFeatures(columns=None, prefix='')

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y=None)

None

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

None

DataAnalyst

DataAnalyst(data)

Class for Preprocessed object for data analysis.

Attributes

data: pd.DataFrame of Dataset

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DataAnalyst/

Methods

Xy_dataset(target=None)

Separar datos del target en conjunto (X, y)

boxplot(features=None, target=None, display=False, save_image=False, path='/', width=2)

Funcion que realiza un BoxPlot sobre la dispesion de cada categoria respecto a los grupos de target.

Inputs: - data: Datos generales del dataset. - features: categorias a analizar.

categorical_vs_numerical()

None

corr_matrix(features=None, display=True, save_image=False, path='/')

matriz de covarianza:

Un valor positivo para r indica una asociacion positiva Un valor negativo para r indica una asociacion negativa.

Cuanto mas cerca estar de 1cuanto mas se acercan los puntos de datos a una linea recta, la asociacion lineal es mas fuerte. Cuanto mas cerca este r de 0, lo que debilita la asociacion lineal.

dispersion_categoria(features=None, target=None, density=True, display=False, width=2, save_image=False, path='/')

Funcion que realiza un plot sobre la dispesion de cada categoria respecto a los grupos de target.

Inputs: - data: Datos generales del dataset. - features: categorias a analizar.

distribution_targets(target=None, display=True, save_image=False, path='/', palette='Set2')

None

dtypes(X=None)

retorno del tipo de datos por columna

isNull()

None

load_data(filename, name='dataset', sep=';', decimal=',', params)

Loading a dataset from a csv file.

Parameters

filename: str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO.

seps: str Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python's builtin sniffer tool, csv.Sniffer.

delimiter: str, default None Alias for sep.

Attributes

n: lenght of dataset. start: start iterator. end: end iterator. num: current iterator.

Returns

data: Pandas DataFrame, [n_samples, n_classes] Dataframe from dataset.

Examples

For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/load/DataLoad/

load_dataframe(data)

None

missing_values(X=None)

Numero de valores vacios en el dataframe.

not_type_object()

Deteccion de de categorias con type "object"

reset()

None

sns_jointplot(feature1, feature2, target=None, categoria1=None, categoria2=None, display=True, save_image=False, path='/')

None

sns_pairplot(features=None, target=None, display=True, save_image=False, path='/', palette='husl')

None

type_object()

Deteccion de de categorias con type "object"

view_features()

Mostrar features del dataframe

DataCleaner

DataCleaner(data)

Class to preprocessed object for data cleaning.

Attributes

data: pd.DataFrame of Dataset

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DataCleaner/

Methods

categorical_vs_numerical()

None

dtypes()

retorno del tipo de datos por columna

isNull()

None

load_data(filename, sep=';', decimal=',', params)

Loading a dataset from a csv file.

Parameters

delimiter: str, default None Alias for sep.

Attributes

n: lenght of dataset. start: start iterator. end: end iterator. num: current iterator.

Returns

data: Pandas DataFrame, [n_samples, n_classes] Dataframe from dataset.

Examples

For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/load/DataLoad/

load_dataframe(data)

None

missing_values()

Numero de valores vacios en el dataframe.

not_type_object()

Deteccion de de categorias con type "object"

reset()

None

type_object()

Deteccion de de categorias con type "object"

view_features()

Mostrar features del dataframe

DataExploratory

DataExploratory(data)

Class to preprocessed object for data cleaning.

Attributes

data: pd.DataFrame of Dataset

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DataCleaner/

Methods

categorical_vs_numerical()

None

dtypes(X=None)

retorno del tipo de datos por columna

isNull()

None

load_data(filename, name='dataset', sep=';', decimal=',', params)

Loading a dataset from a csv file.

Parameters

delimiter: str, default None Alias for sep.

Attributes

n: lenght of dataset. start: start iterator. end: end iterator. num: current iterator.

Returns

data: Pandas DataFrame, [n_samples, n_classes] Dataframe from dataset.

Examples

For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/load/DataLoad/

load_dataframe(data)

None

missing_values(X=None)

Numero de valores vacios en el dataframe.

not_type_object()

Deteccion de de categorias con type "object"

reset()

None

type_object()

Deteccion de de categorias con type "object"

view_features()

Mostrar features del dataframe

DataFrameSelector

DataFrameSelector(attribute_names)

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y=None)

None

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

None

DropFeatures

DropFeatures(columns_drop=None, random_state=99)

This transformer drop features.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DropFeatures/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

DropOutliers

DropOutliers(features=[], display=False)

Drop Outliers from dataframe

Attributes

features: listor tuple list of features to drop outliers [n_columns] display:boolean` Show histogram with changes made.

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DropOutliers/

Methods

fit(X, y=None, fit_params)

Gets the columns that not drop.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X, fit_params)

Features drop.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns dropped.

ExtractCategories

ExtractCategories(categories=None, target=None)

This transformer filters the selected dataset categories.

Attributes

categories: list of categories that you want to keep.

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/ReplaceTransformer/

Methods

fit(X, y=None, fit_params)

Gets the columns to make filters the selected dataset categories.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Gets the columns to make filters the selected dataset categories.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FeatureDropper

FeatureDropper(drop=[])

Column drop according to the selected feature.

Attributes

drop: list of features to drop [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FeatureDropper/

Methods

fit(X, y=None, fit_params)

Gets the columns that not drop.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X, fit_params)

Features drop.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns dropped.

FeatureSelector

FeatureSelector(columns=None, random_state=99)

This transformer select features.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FeatureSelector/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_all

FillNaTransformer_all()

This transformer delete row that there is all NaN.

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_all/

Methods

fit(X, y=None, fit_params)

Not implemented.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

This transformer delete row that there is some NaN

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_any

FillNaTransformer_any()

This transformer delete row that there is some NaN.

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_any/

Methods

fit(X, y=None, fit_params)

Not implemented.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

This transformer delete row that there is some NaN

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_backward

FillNaTransformer_backward(columns=None)

This transformer handles missing values closer backward.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_backward/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_forward

FillNaTransformer_forward(columns=None)

This transformer handles missing values closer forward.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_forward/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_idmax

FillNaTransformer_idmax(columns=None)

This transformer handles missing values for idmax.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_idmax/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_mean

FillNaTransformer_mean(columns=None)

This transformer handles missing values.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_mean/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_median

FillNaTransformer_median(columns=None)

This transformer handles missing values.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_median/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FillNaTransformer_value

FillNaTransformer_value(columns=None)

This transformer handles missing values.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_value/

Methods

fit(X, y=None, value=None, fit_params)

Gets the columns to make a replace missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

this transformer handles missing values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

FixSkewness

FixSkewness(columns=None, drop=True)

This transformer applies log to skewed features.

Attributes

columns: npandas [n_columns]

Examples

For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FixSkewness/

Methods

fit(X, y=None, fit_params)

Selecting skewed columns from the dataset.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Trransformer applies log to skewed features.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns centered.

Keep

Keep()

Mantener columnas.

Methods

fit(X, y=None)

None

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

None

LDA_add

LDA_add(columns=None, LDA_name=None, random_state=99)

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y=None)

Selecting LDA columns from the dataset.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Trransformer applies LDA.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns centered.

LDA_selector

LDA_selector(columns=None, random_state=99)

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y)

Selecting LDA columns from the dataset.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Trransformer applies LDA.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns centered.

LabelEncoder

LabelEncoder()

Encode target labels with value between 0 and n_classes-1.

This transformer should be used to encode target values, i.e. y, and not the input X.

Read more in the :ref:User Guide <preprocessing_targets>.

.. versionadded:: 0.12

Attributes

classes_ : array of shape (n_class,)

Holds the label for each class.

Examples

LabelEncoder can be used to normalize labels.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are
hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

See also

sklearn.preprocessing.OrdinalEncoder : Encode categorical features

using an ordinal encoding scheme.
sklearn.preprocessing.OneHotEncoder : Encode categorical features

as a one-hot numeric array.

Methods

fit(y)

Fit label encoder

Parameters

y : array-like of shape (n_samples,)

Target values.

Returns

self : returns an instance of self.

fit_transform(y)

Fit label encoder and return encoded labels

Parameters

y : array-like of shape [n_samples]

Target values.

Returns

y : array-like of shape [n_samples]

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

inverse_transform(y)

Transform labels back to original encoding.

Parameters

y : numpy array of shape [n_samples]

Target values.

Returns

y : numpy array of shape [n_samples]

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(y)

Transform labels to normalized encoding.

Parameters

y : array-like of shape [n_samples]

Target values.

Returns

y : array-like of shape [n_samples]

MFD_OrientationClassTransformer

MFD_OrientationClassTransformer(columns, name='MFDOCT', a=120, b=60, c=30, d=150)

Transformer MFD Orientation.

Methods

fit(X, y=None)

None

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

None

MeanCenterer

MeanCenterer(columns=None)

Column centering of pandas Dataframe.

Attributes

col_means: numpy.ndarray [n_columns] or pandas [n_columns] mean values for centering after fitting the MeanCenterer object.

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/MeanCenterer/

adapted from https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/mean_centering.py Author: Sebastian Raschka License: BSD 3 clause

Methods

fit(X, y=None)

Gets the column means for mean centering.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Centers a pandas.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns centered.

OneHotEncoder

OneHotEncoder(columns=None, numerical=[], Drop=True)

This transformer applies One-Hot-Encoder to features.

Attributes

numerical: pandas [n_columns]. numerical columns to be treated as categorical. columns: pandas [n_columns]. columns to use (if None then all categorical variables are included).

Examples

For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/OneHotEncoder/

Methods

fit(X, y=None, fit_params)

Selecting OneHotEncoder columns from the dataset.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Trransformer applies log to skewed features.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns encoder.

OrientationClassTransformer

OrientationClassTransformer(columns, name='OCT', a=135, b=45)

Transformer Orientation.

Methods

fit(X, y=None)

None

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

None

PCA_add

PCA_add(columns=None, n_components=2, PCA_name=None, random_state=99)

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y=None)

Selecting PCA columns from the dataset.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Trransformer applies PCA.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns centered.

PCA_selector

PCA_selector(columns=None, n_components=2, random_state=99)

Base class for all estimators in scikit-learn

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

Methods

fit(X, y=None)

Selecting PCA columns from the dataset.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Trransformer applies PCA.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {DAtaframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns centered.

ReplaceMulticlass

ReplaceMulticlass(columns=None)

This transformer replace some categorical values with others.

Attributes

columns: list of columns to transformer [n_columns]

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/ReplaceMulticlass/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Gets the columns to make a replace to categorical values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

ReplaceTransformer

ReplaceTransformer(columns=None, mapping=None)

This transformer replace some values with others.

Attributes

columns: list of columns to transformer [n_columns]

mapping: dict`, for example:

mapping = {"yes": 1, "no": 0}

Examples

For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/ReplaceTransformer/

Methods

fit(X, y=None, fit_params)

Gets the columns to make a replace values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe, where n_samples is the number of samples and n_features is the number of features.

Returns

self

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X)

Gets the columns to make a replace values.

Parameters

X : {Dataframe}, shape = [n_samples, n_features]

Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.

Returns

X_transform : {Dataframe}, shape = [n_samples, n_features]

A copy of the input Dataframe with the columns replaced.

StandardScaler

StandardScaler(, copy=True, with_mean=True, with_std=True)*

Standardize features by removing the mean and scaling to unit variance

The standard score of a sample x is calculated as:

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using :meth:transform.

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.

Read more in the :ref:User Guide <preprocessing_scaler>.

Parameters

copy : boolean, optional, default True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
with_mean : boolean, True by default

If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
with_std : boolean, True by default

If True, scale the data to unit variance (or equivalently, unit standard deviation).

Attributes

scale_ : ndarray or None, shape (n_features,)

Per feature relative scaling of the data. This is calculated using np.sqrt(var_). Equal to None when with_std=False.

.. versionadded:: 0.17 scale_
mean_ : ndarray or None, shape (n_features,)

The mean value for each feature in the training set. Equal to None when with_mean=False.
var_ : ndarray or None, shape (n_features,)

The variance for each feature in the training set. Used to compute scale_. Equal to None when with_std=False.
n_samples_seen_ : int or array, shape (n_features,)

The number of samples processed by the estimator for each feature. If there are not missing samples, the n_samples_seen will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments across partial_fit calls.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
[-1. -1.]
[ 1.  1.]
[ 1.  1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]

See also

scale: Equivalent function without the estimator API.

:class:`sklearn.decomposition.PCA`
Further removes the linear correlation across features with 'whiten=True'.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

We use a biased estimator for the standard deviation, equivalent to
`numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to
affect model performance.

For a comparison of the different scalers, transformers, and normalizers,
see :ref:`examples/preprocessing/plot_all_scaling.py
<sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.

Methods

fit(X, y=None)

Compute the mean and std to be used for later scaling.

Parameters

X : {array-like, sparse matrix}, shape [n_samples, n_features]

The data used to compute the mean and standard deviation used for later scaling along the features axis.

y Ignored

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.
**fit_params : dict

Additional fit parameters.

Returns

X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params : mapping of string to any

Parameter names mapped to their values.

inverse_transform(X, copy=None)

Scale back the data to the original representation

Parameters

X : array-like, shape [n_samples, n_features]

The data used to scale along the features axis.
copy : bool, optional (default: None)

Copy the input X or not.

Returns

X_tr : array-like, shape [n_samples, n_features]

Transformed array.

partial_fit(X, y=None)

Online computation of mean and std on X for later scaling.

All of X is processed as a single batch. This is intended for cases when :meth:fit is not feasible due to very large number of n_samples or because X is read from a continuous stream.

The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. "Algorithms for computing the sample variance: Analysis and recommendations." The American Statistician 37.3 (1983): 242-247:

Parameters

X : {array-like, sparse matrix}, shape [n_samples, n_features]

The data used to compute the mean and standard deviation used for later scaling along the features axis.
y : None

Ignored.

Returns

self : object

Transformer instance.

set_params(params)

Set the parameters of this estimator.

Parameters

**params : dict

Estimator parameters.

Returns

self : object

Estimator instance.

transform(X, copy=None)

Perform standardization by centering and scaling

Parameters

X : array-like, shape [n_samples, n_features]

The data used to scale along the features axis.
copy : bool, optional (default: None)

Copy the input X or not.

minmax_scaling

minmax_scaling(X, columns, min_val=0, max_val=1)

In max scaling of pandas DataFrames.

Parameters

array : pandas DataFrame, shape = [n_rows, n_columns].
columns : array-like, shape = [n_columns]

Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...]
min_val : int or float, optional (default=0)

minimum value after rescaling.
max_val : int or float, optional (default=1)

maximum value after rescaling.

Returns

df_new : pandas DataFrame object.

Copy of the array or DataFrame with rescaled columns.

Examples

For usage examples, please see
[http://jaisenbe58r.github.io/mlearner/user_guide/preprocessing/minmax_scaling/.](http://jaisenbe58r.github.io/mlearner/user_guide/preprocessing/minmax_scaling/.)


adapted from
https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/scaling.py
Author: Sebastian Raschka <sebastianraschka.com>
License: BSD 3 clause