What is it?

scikit-learn, also called sk-learn is an open-source Python library for Machine Learning, built on top of SciPy and NumPy, currently maintained by volunteers.

It features various Classification, Regression and Clustering algorithms, including SVMs, Decision Trees, Gradient Boosting, and many others.


Using scikit-learn

To start with scikit-learn, one must ensure that Python is installed. Then run the following command to retrieve the library from PyPi:

pip install scikit-learn

Machine Learning algorithms and models that returns predicted values are called estimators, and the ones that process data are called transformers. scikit-learn also makes available various tools for creating end-to-end models, like pipelines, and tools for model evaluation and hyper-parameter optimization.

Here is a simple Linear Regression model using scikit-learn:

import matplotlib.pyplot as plt  
from sklearn.pipeline import make_pipeline  
from sklearn.linear_model import LinearRegression  
from sklearn.preprocessing import StandardScaler  
from sklearn.datasets import make_regression  
  
# Generate dummy data  
X, y = make_regression(n_features=1, noise=25, random_state=42)  
  
# Create a pipeline and fit the model  
model = make_pipeline(StandardScaler(), LinearRegression())  
model.fit(X, y)  
  
# Predicted values based on our data points  
predicted = model.predict(X)  
  
# Plot the data points and the fitted line  
plt.scatter(X[:, 0], y)  # Plot data points  
plt.plot(X[:, 0], predicted)  # Plot fitted line  
  
# Label the axes and add a title  
plt.xlabel("Feature 1")  
plt.ylabel("Target Variable")  
plt.title("Linear regression example")  
  
# Show the plot  
plt.show()


Official documentation

It’s impossible to work with scikit-learn’s API without using its documentation as a guide. Refer to it by clicking here.