mlearner version: 0.2.0
CategoricalEncoder
CategoricalEncoder(encoding='onehot', categories='auto', dtype=
Encode categorical features as a numeric array.
The input to this transformer should be a matrix of integers or strings,
denoting the values taken on by categorical (discrete) features.
The features can be encoded using a one-hot aka one-of-K scheme
(encoding='onehot'
, the default) or converted to ordinal integers
(encoding='ordinal'
).
This encoding is needed for feeding categorical data to many scikit-learn
estimators, notably linear models and SVMs with the standard kernels.
Read more in the :ref:User Guide <preprocessing_categorical_features>
.
Parameters
-
encoding
: str, 'onehot', 'onehot-dense' or 'ordinal'The type of encoding to use (default is 'onehot'): - 'onehot': encode the features using a one-hot aka one-of-K scheme (or also called 'dummy' encoding). This creates a binary column for each category and returns a sparse matrix. - 'onehot-dense': the same as 'onehot' but returns a dense array instead of a sparse matrix. - 'ordinal': encode the features as ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature.
-
categories
: 'auto' or a list of lists/arrays of values.Categories (unique values) per feature:
-
- 'auto'
: Determine categories automatically from the training data. -
- list
:categories[i]
holds the categories expected in the ithcolumn. The passed categories are sorted before encoding the data (used categories can be found in the
categories_
attribute). -
dtype
: number type, default np.float64Desired dtype of output.
-
handle_unknown
: 'error' (default) or 'ignore'Whether to raise an error or ignore if a unknown categorical feature is present during transform (default is to raise). When this is parameter is set to 'ignore' and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. Ignoring unknown categories is not supported for
encoding='ordinal'
.
Attributes
-
categories_
: list of arraysThe categories of each feature determined during fitting. When categories were specified manually, this holds the sorted categories (in order corresponding with output of
transform
).
Examples
Given a dataset with three features and two samples, we let the encoder find the maximum value per feature and transform the data to a binary one-hot encoding. >>> from sklearn.preprocessing import CategoricalEncoder >>> enc = CategoricalEncoder(handle_unknown='ignore') >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) ... # doctest: +ELLIPSIS CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>, encoding='onehot', handle_unknown='ignore') >>> enc.transform([[0, 1, 1], [1, 0, 4]]).toarray() array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.], [ 0., 1., 1., 0., 0., 0., 0., 0., 0.]])
See also
-
sklearn.preprocessing.OneHotEncoder
: performs a one-hot encoding ofinteger ordinal features. The
OneHotEncoder assumes
that input features take on values in the range[0, max(feature)]
instead of using the unique values. -
sklearn.feature_extraction.DictVectorizer
: performs a one-hot encoding ofdictionary items (also handles string-valued features).
-
sklearn.feature_extraction.FeatureHasher
: performs an approximate one-hotencoding of dictionary items or strings.
Methods
fit(X, y=None)
Fit the CategoricalEncoder to X. Parameters
-
X
: array-like, shape [n_samples, n_feature]The data to determine the categories of each feature. Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Transform X using one-hot encoding.
Parameters
-
X
: array-like, shape [n_samples, n_features]The data to encode.
Returns
-
X_out
: sparse matrix or a 2-d arrayTransformed input.
ClassTransformer_value
ClassTransformer_value(columns, name='A/AH_cat', value=100)
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y=None)
None
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
None
CopyFeatures
CopyFeatures(columns=None, prefix='')
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y=None)
None
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
None
DataAnalyst
DataAnalyst(data)
Class for Preprocessed object for data analysis.
Attributes
data: pd.DataFrame of Dataset
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DataAnalyst/
Methods
Xy_dataset(target=None)
Separar datos del target en conjunto (X, y)
boxplot(features=None, target=None, display=False, save_image=False, path='/', width=2)
Funcion que realiza un BoxPlot sobre la dispesion de cada categoria respecto a los grupos de target.
Inputs: - data: Datos generales del dataset. - features: categorias a analizar.
categorical_vs_numerical()
None
corr_matrix(features=None, display=True, save_image=False, path='/')
matriz de covarianza:
Un valor positivo para r indica una asociacion positiva Un valor negativo para r indica una asociacion negativa.
Cuanto mas cerca estar de 1cuanto mas se acercan los puntos de datos a una linea recta, la asociacion lineal es mas fuerte. Cuanto mas cerca este r de 0, lo que debilita la asociacion lineal.
dispersion_categoria(features=None, target=None, density=True, display=False, width=2, save_image=False, path='/')
Funcion que realiza un plot sobre la dispesion de cada categoria respecto a los grupos de target.
Inputs: - data: Datos generales del dataset. - features: categorias a analizar.
distribution_targets(target=None, display=True, save_image=False, path='/', palette='Set2')
None
dtypes(X=None)
retorno del tipo de datos por columna
isNull()
None
load_data(filename, name='dataset', sep=';', decimal=',', params)
Loading a dataset from a csv file.
Parameters
filename: str, path object or file-like object
Any valid string path is acceptable. The string could be a URL.
Valid URL schemes include http, ftp, s3, and file. For file URLs,
a host is expected. A local file could be:
file://localhost/path/to/table.csv
.
If you want to pass in a path object, pandas accepts any os.PathLike.
By file-like object, we refer to objects with a read() method,
such as a file handler (e.g. via builtin open function) or StringIO.
seps: str
Delimiter to use. If sep is None, the C engine cannot automatically
detect the separator, but the Python parsing engine can, meaning the
latter will be used and automatically detect the separator by Python's
builtin sniffer tool, csv.Sniffer.
delimiter: str, default None
Alias for sep.
Attributes
n: lenght of dataset. start: start iterator. end: end iterator. num: current iterator.
Returns
data: Pandas DataFrame, [n_samples, n_classes] Dataframe from dataset.
Examples
For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/load/DataLoad/
load_dataframe(data)
None
missing_values(X=None)
Numero de valores vacios en el dataframe.
not_type_object()
Deteccion de de categorias con type "object"
reset()
None
sns_jointplot(feature1, feature2, target=None, categoria1=None, categoria2=None, display=True, save_image=False, path='/')
None
sns_pairplot(features=None, target=None, display=True, save_image=False, path='/', palette='husl')
None
type_object()
Deteccion de de categorias con type "object"
view_features()
Mostrar features del dataframe
DataCleaner
DataCleaner(data)
Class to preprocessed object for data cleaning.
Attributes
data: pd.DataFrame
of Dataset
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DataCleaner/
Methods
categorical_vs_numerical()
None
dtypes()
retorno del tipo de datos por columna
isNull()
None
load_data(filename, sep=';', decimal=',', params)
Loading a dataset from a csv file.
Parameters
filename: str, path object or file-like object
Any valid string path is acceptable. The string could be a URL.
Valid URL schemes include http, ftp, s3, and file. For file URLs,
a host is expected. A local file could be:
file://localhost/path/to/table.csv
.
If you want to pass in a path object, pandas accepts any os.PathLike.
By file-like object, we refer to objects with a read() method,
such as a file handler (e.g. via builtin open function) or StringIO.
seps: str
Delimiter to use. If sep is None, the C engine cannot automatically
detect the separator, but the Python parsing engine can, meaning the
latter will be used and automatically detect the separator by Python's
builtin sniffer tool, csv.Sniffer.
delimiter: str, default None
Alias for sep.
Attributes
n: lenght of dataset. start: start iterator. end: end iterator. num: current iterator.
Returns
data: Pandas DataFrame, [n_samples, n_classes] Dataframe from dataset.
Examples
For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/load/DataLoad/
load_dataframe(data)
None
missing_values()
Numero de valores vacios en el dataframe.
not_type_object()
Deteccion de de categorias con type "object"
reset()
None
type_object()
Deteccion de de categorias con type "object"
view_features()
Mostrar features del dataframe
DataExploratory
DataExploratory(data)
Class to preprocessed object for data cleaning.
Attributes
data: pd.DataFrame
of Dataset
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DataCleaner/
Methods
categorical_vs_numerical()
None
dtypes(X=None)
retorno del tipo de datos por columna
isNull()
None
load_data(filename, name='dataset', sep=';', decimal=',', params)
Loading a dataset from a csv file.
Parameters
filename: str, path object or file-like object
Any valid string path is acceptable. The string could be a URL.
Valid URL schemes include http, ftp, s3, and file. For file URLs,
a host is expected. A local file could be:
file://localhost/path/to/table.csv
.
If you want to pass in a path object, pandas accepts any os.PathLike.
By file-like object, we refer to objects with a read() method,
such as a file handler (e.g. via builtin open function) or StringIO.
seps: str
Delimiter to use. If sep is None, the C engine cannot automatically
detect the separator, but the Python parsing engine can, meaning the
latter will be used and automatically detect the separator by Python's
builtin sniffer tool, csv.Sniffer.
delimiter: str, default None
Alias for sep.
Attributes
n: lenght of dataset. start: start iterator. end: end iterator. num: current iterator.
Returns
data: Pandas DataFrame, [n_samples, n_classes] Dataframe from dataset.
Examples
For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/load/DataLoad/
load_dataframe(data)
None
missing_values(X=None)
Numero de valores vacios en el dataframe.
not_type_object()
Deteccion de de categorias con type "object"
reset()
None
type_object()
Deteccion de de categorias con type "object"
view_features()
Mostrar features del dataframe
DataFrameSelector
DataFrameSelector(attribute_names)
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y=None)
None
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
None
DropFeatures
DropFeatures(columns_drop=None, random_state=99)
This transformer drop features.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DropFeatures/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
DropOutliers
DropOutliers(features=[], display=False)
Drop Outliers from dataframe
Attributes
features: list
or tuple
list of features to drop outliers [n_columns]
display:
boolean`
Show histogram with changes made.
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/DropOutliers/
Methods
fit(X, y=None, fit_params)
Gets the columns that not drop.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X, fit_params)
Features drop.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns dropped.
ExtractCategories
ExtractCategories(categories=None, target=None)
This transformer filters the selected dataset categories.
Attributes
categories: list
of categories that you want to keep.
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/ReplaceTransformer/
Methods
fit(X, y=None, fit_params)
Gets the columns to make filters the selected dataset categories.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Gets the columns to make filters the selected dataset categories.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FeatureDropper
FeatureDropper(drop=[])
Column drop according to the selected feature.
Attributes
drop: list of features to drop [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FeatureDropper/
Methods
fit(X, y=None, fit_params)
Gets the columns that not drop.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X, fit_params)
Features drop.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns dropped.
FeatureSelector
FeatureSelector(columns=None, random_state=99)
This transformer select features.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FeatureSelector/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_all
FillNaTransformer_all()
This transformer delete row that there is all NaN.
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_all/
Methods
fit(X, y=None, fit_params)
Not implemented.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
This transformer delete row that there is some NaN
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_any
FillNaTransformer_any()
This transformer delete row that there is some NaN.
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_any/
Methods
fit(X, y=None, fit_params)
Not implemented.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
This transformer delete row that there is some NaN
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_backward
FillNaTransformer_backward(columns=None)
This transformer handles missing values closer backward.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_backward/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_forward
FillNaTransformer_forward(columns=None)
This transformer handles missing values closer forward.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_forward/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_idmax
FillNaTransformer_idmax(columns=None)
This transformer handles missing values for idmax.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_idmax/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_mean
FillNaTransformer_mean(columns=None)
This transformer handles missing values.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_mean/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_median
FillNaTransformer_median(columns=None)
This transformer handles missing values.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_median/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FillNaTransformer_value
FillNaTransformer_value(columns=None)
This transformer handles missing values.
Attributes
columns: list of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FillNaTransformer_value/
Methods
fit(X, y=None, value=None, fit_params)
Gets the columns to make a replace missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
this transformer handles missing values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
FixSkewness
FixSkewness(columns=None, drop=True)
This transformer applies log to skewed features.
Attributes
columns: npandas [n_columns]
Examples
For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/FixSkewness/
Methods
fit(X, y=None, fit_params)
Selecting skewed columns from the dataset.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Trransformer applies log to skewed features.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns centered.
Keep
Keep()
Mantener columnas.
Methods
fit(X, y=None)
None
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
None
LDA_add
LDA_add(columns=None, LDA_name=None, random_state=99)
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y=None)
Selecting LDA columns from the dataset.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Trransformer applies LDA.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns centered.
LDA_selector
LDA_selector(columns=None, random_state=99)
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y)
Selecting LDA columns from the dataset.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Trransformer applies LDA.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns centered.
LabelEncoder
LabelEncoder()
Encode target labels with value between 0 and n_classes-1.
This transformer should be used to encode target values, i.e. y
, and
not the input X
.
Read more in the :ref:User Guide <preprocessing_targets>
.
.. versionadded:: 0.12
Attributes
-
classes_
: array of shape (n_class,)Holds the label for each class.
Examples
LabelEncoder
can be used to normalize labels.
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are
hashable and comparable) to numerical labels.
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']
See also
-
sklearn.preprocessing.OrdinalEncoder
: Encode categorical featuresusing an ordinal encoding scheme.
-
sklearn.preprocessing.OneHotEncoder
: Encode categorical featuresas a one-hot numeric array.
Methods
fit(y)
Fit label encoder
Parameters
-
y
: array-like of shape (n_samples,)Target values.
Returns
self
: returns an instance of self.
fit_transform(y)
Fit label encoder and return encoded labels
Parameters
-
y
: array-like of shape [n_samples]Target values.
Returns
y
: array-like of shape [n_samples]
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
inverse_transform(y)
Transform labels back to original encoding.
Parameters
-
y
: numpy array of shape [n_samples]Target values.
Returns
y
: numpy array of shape [n_samples]
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(y)
Transform labels to normalized encoding.
Parameters
-
y
: array-like of shape [n_samples]Target values.
Returns
y
: array-like of shape [n_samples]
MFD_OrientationClassTransformer
MFD_OrientationClassTransformer(columns, name='MFDOCT', a=120, b=60, c=30, d=150)
Transformer MFD Orientation.
Methods
fit(X, y=None)
None
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
None
MeanCenterer
MeanCenterer(columns=None)
Column centering of pandas Dataframe.
Attributes
col_means: numpy.ndarray [n_columns] or pandas [n_columns] mean values for centering after fitting the MeanCenterer object.
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/MeanCenterer/
adapted from
https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/mean_centering.py
Author: Sebastian Raschka
Methods
fit(X, y=None)
Gets the column means for mean centering.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Centers a pandas.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns centered.
OneHotEncoder
OneHotEncoder(columns=None, numerical=[], Drop=True)
This transformer applies One-Hot-Encoder to features.
Attributes
numerical: pandas [n_columns]. numerical columns to be treated as categorical. columns: pandas [n_columns]. columns to use (if None then all categorical variables are included).
Examples
For usage examples, please see: https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/OneHotEncoder/
Methods
fit(X, y=None, fit_params)
Selecting OneHotEncoder columns from the dataset.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Trransformer applies log to skewed features.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns encoder.
OrientationClassTransformer
OrientationClassTransformer(columns, name='OCT', a=135, b=45)
Transformer Orientation.
Methods
fit(X, y=None)
None
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
None
PCA_add
PCA_add(columns=None, n_components=2, PCA_name=None, random_state=99)
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y=None)
Selecting PCA columns from the dataset.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Trransformer applies PCA.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns centered.
PCA_selector
PCA_selector(columns=None, n_components=2, random_state=99)
Base class for all estimators in scikit-learn
Notes
All estimators should specify all the parameters that can be set
at the class level in their __init__
as explicit keyword
arguments (no *args
or **kwargs
).
Methods
fit(X, y=None)
Selecting PCA columns from the dataset.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Trransformer applies PCA.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {DAtaframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns centered.
ReplaceMulticlass
ReplaceMulticlass(columns=None)
This transformer replace some categorical values with others.
Attributes
columns: list
of columns to transformer [n_columns]
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/ReplaceMulticlass/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Gets the columns to make a replace to categorical values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
ReplaceTransformer
ReplaceTransformer(columns=None, mapping=None)
This transformer replace some values with others.
Attributes
columns: list
of columns to transformer [n_columns]
mapping: dict`, for example:
mapping = {"yes": 1, "no": 0}
Examples
For usage examples, please see https://jaisenbe58r.github.io/MLearner/user_guide/preprocessing/ReplaceTransformer/
Methods
fit(X, y=None, fit_params)
Gets the columns to make a replace values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X)
Gets the columns to make a replace values.
Parameters
-
X
: {Dataframe}, shape = [n_samples, n_features]Dataframe of samples, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_transform
: {Dataframe}, shape = [n_samples, n_features]A copy of the input Dataframe with the columns replaced.
StandardScaler
StandardScaler(, copy=True, with_mean=True, with_std=True)*
Standardize features by removing the mean and scaling to unit variance
The standard score of a sample x
is calculated as:
z = (x - u) / s
where u
is the mean of the training samples or zero if with_mean=False
,
and s
is the standard deviation of the training samples or one if
with_std=False
.
Centering and scaling happen independently on each feature by computing
the relevant statistics on the samples in the training set. Mean and
standard deviation are then stored to be used on later data using
:meth:transform
.
Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
This scaler can also be applied to sparse CSR or CSC matrices by passing
with_mean=False
to avoid breaking the sparsity structure of the data.
Read more in the :ref:User Guide <preprocessing_scaler>
.
Parameters
-
copy
: boolean, optional, default TrueIf False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
-
with_mean
: boolean, True by defaultIf True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
-
with_std
: boolean, True by defaultIf True, scale the data to unit variance (or equivalently, unit standard deviation).
Attributes
-
scale_
: ndarray or None, shape (n_features,)Per feature relative scaling of the data. This is calculated using
np.sqrt(var_)
. Equal toNone
whenwith_std=False
... versionadded:: 0.17 scale_
-
mean_
: ndarray or None, shape (n_features,)The mean value for each feature in the training set. Equal to
None
whenwith_mean=False
. -
var_
: ndarray or None, shape (n_features,)The variance for each feature in the training set. Used to compute
scale_
. Equal toNone
whenwith_std=False
. -
n_samples_seen_
: int or array, shape (n_features,)The number of samples processed by the estimator for each feature. If there are not missing samples, the
n_samples_seen
will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments acrosspartial_fit
calls.
Examples
>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
[-1. -1.]
[ 1. 1.]
[ 1. 1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]
See also
scale: Equivalent function without the estimator API.
:class:`sklearn.decomposition.PCA`
Further removes the linear correlation across features with 'whiten=True'.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
We use a biased estimator for the standard deviation, equivalent to
`numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to
affect model performance.
For a comparison of the different scalers, transformers, and normalizers,
see :ref:`examples/preprocessing/plot_all_scaling.py
<sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.
Methods
fit(X, y=None)
Compute the mean and std to be used for later scaling.
Parameters
-
X
: {array-like, sparse matrix}, shape [n_samples, n_features]The data used to compute the mean and standard deviation used for later scaling along the features axis.
y Ignored
fit_transform(X, y=None, fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
-
X
: {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) -
y
: ndarray of shape (n_samples,), default=NoneTarget values.
-
**fit_params
: dictAdditional fit parameters.
Returns
-
X_new
: ndarray array of shape (n_samples, n_features_new)Transformed array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
inverse_transform(X, copy=None)
Scale back the data to the original representation
Parameters
-
X
: array-like, shape [n_samples, n_features]The data used to scale along the features axis.
-
copy
: bool, optional (default: None)Copy the input X or not.
Returns
-
X_tr
: array-like, shape [n_samples, n_features]Transformed array.
partial_fit(X, y=None)
Online computation of mean and std on X for later scaling.
All of X is processed as a single batch. This is intended for cases
when :meth:fit
is not feasible due to very large number of
n_samples
or because X is read from a continuous stream.
The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. "Algorithms for computing the sample variance: Analysis and recommendations." The American Statistician 37.3 (1983): 242-247:
Parameters
-
X
: {array-like, sparse matrix}, shape [n_samples, n_features]The data used to compute the mean and standard deviation used for later scaling along the features axis.
-
y
: NoneIgnored.
Returns
-
self
: objectTransformer instance.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
transform(X, copy=None)
Perform standardization by centering and scaling
Parameters
-
X
: array-like, shape [n_samples, n_features]The data used to scale along the features axis.
-
copy
: bool, optional (default: None)Copy the input X or not.
minmax_scaling
minmax_scaling(X, columns, min_val=0, max_val=1)
In max scaling of pandas DataFrames.
Parameters
-
array
: pandas DataFrame, shape = [n_rows, n_columns]. -
columns
: array-like, shape = [n_columns]Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...]
-
min_val
:int
orfloat
, optional (default=0
)minimum value after rescaling.
-
max_val
:int
orfloat
, optional (default=1
)maximum value after rescaling.
Returns
-
df_new
: pandas DataFrame object.Copy of the array or DataFrame with rescaled columns.
Examples
For usage examples, please see
[http://jaisenbe58r.github.io/mlearner/user_guide/preprocessing/minmax_scaling/.](http://jaisenbe58r.github.io/mlearner/user_guide/preprocessing/minmax_scaling/.)
adapted from
https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/scaling.py
Author: Sebastian Raschka <sebastianraschka.com>
License: BSD 3 clause