Home:ALL Converter>sklearn pipeline with PCA on feature subset using FunctionTransformer

sklearn pipeline with PCA on feature subset using FunctionTransformer

Ask Time:2018-12-01T00:42:16         Author:kanimbla

Json Formatter

Consider the task of chaining a PCA and regression, where PCA performs dimensionality reduction and regression does the prediction.

Example taken from the sklearn documentation:

import numpy as np
import matplotlib.pyplot as plt

from sklearn import linear_model, decomposition, datasets
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

logistic = linear_model.LogisticRegression()

pca = decomposition.PCA()
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])

digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target

n_components = [5, 10]
Cs = np.logspace(-4, 4, 3)

param_grid = dict(pca__n_components=n_components, logistic__C=Cs)
estimator = GridSearchCV(pipe,param_grid)
estimator.fit(X_digits, y_digits)

How can I perform dimensionality reduction only on a subset of my feature set using FunctionTransformer (for example, restrict PCA to the last ten columns of X_digits)?

Author:kanimbla,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/53561598/sklearn-pipeline-with-pca-on-feature-subset-using-functiontransformer
A Kruger :

You can first create a function (called last_ten_columns below) that returns the last 10 columns of the input X_digits. Create the function transformer that points to the function, and use it as the first step of the pipeline.\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn import linear_model, decomposition, datasets\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.preprocessing import FunctionTransformer\n\nlogistic = linear_model.LogisticRegression()\n\npca = decomposition.PCA()\n\ndef last_ten_columns(X):\n return X[:, -10:]\n\nfunc_trans = FunctionTransformer(last_ten_columns)\n\npipe = Pipeline(steps=[('func_trans',func_trans), ('pca', pca), ('logistic', logistic)])\n\ndigits = datasets.load_digits()\nX_digits = digits.data\ny_digits = digits.target\n\nn_components = [5, 10]\nCs = np.logspace(-4, 4, 3)\n\nparam_grid = dict(pca__n_components=n_components, logistic__C=Cs)\nestimator = GridSearchCV(pipe, param_grid)\nestimator.fit(X_digits, y_digits)\n",
2018-11-30T17:15:29
yy