scikit-learn

Name: scikit-learn
Availability: OnlineOnly
Author: scikit-learn

An open-source Python library providing simple and efficient tools for predictive data analysis, including classification, regression, clustering, and more.

Visit Website

At a Glance

Pricing

Open Source

Completely free and open-source under the BSD 3-Clause license. Free to use, modify, and distribute.

Engagement

Available On

CLI

API

SDK

scikit-learnInria, FranceEst. 2007

Listed May 2026

About scikit-learn

scikit-learn is a free, open-source machine learning library for Python, released under the BSD 3-Clause license. Built on top of NumPy, SciPy, and matplotlib, it provides a consistent, accessible API for a wide range of supervised and unsupervised learning tasks. The project is hosted on GitHub with over 66,000 stars and is actively maintained by a community of contributors with financial support from organizations including Probabl, INRIA, Microsoft, NVIDIA, and others.

What It Is

scikit-learn is a Python-based machine learning toolkit that covers the full predictive modeling workflow — from data preprocessing and feature extraction through model training, evaluation, and selection. It is designed to be accessible to practitioners at all levels while remaining flexible enough for advanced research and production use. The library is distributed under the BSD 3-Clause license, making it free to use, modify, and redistribute in commercial and non-commercial contexts.

Core Capabilities

scikit-learn organizes its functionality into six major areas:

Classification — identifying which category an object belongs to, with algorithms including gradient boosting, nearest neighbors, random forest, and logistic regression; applications include spam detection and image recognition.
Regression — predicting continuous-valued attributes using gradient boosting, nearest neighbors, random forest, ridge regression, and more; applications include drug response modeling and stock price prediction.
Clustering — automatic grouping of similar objects using k-Means, HDBSCAN, hierarchical clustering, and others; applications include customer segmentation.
Dimensionality reduction — reducing the number of variables via PCA, feature selection, and non-negative matrix factorization.
Model selection — comparing, validating, and tuning models through grid search, cross-validation, and evaluation metrics.
Preprocessing — feature extraction and normalization for transforming raw input data (including text) into formats suitable for ML algorithms.

Architecture and Dependencies

scikit-learn is built directly on the scientific Python stack: NumPy for array operations, SciPy for numerical routines, and matplotlib for visualization. This tight integration means it works naturally within the broader Python data science ecosystem, including pandas for data manipulation and Jupyter notebooks for interactive analysis. The library exposes a consistent estimator API — fit, predict, transform — that makes it straightforward to compose pipelines and swap algorithms.

Update: Release 1.8.0

The current stable release is 1.8.0, published in December 2025. The project maintains a rapid release cadence: version 1.7.0 shipped in June 2025, 1.7.1 in July 2025, 1.7.2 in September 2025, and 1.8.0 in December 2025. Development on version 1.9 is ongoing, with a release candidate (1.9.0rc1) already available. The changelog and release highlights are published alongside each release on the official documentation site.

Community and Governance

scikit-learn operates as a community-driven open-source project with a published governance model and roadmap. The project maintains active channels on Discord, GitHub Discussions, Stack Overflow, a mailing list, and social platforms including Bluesky, Mastodon, LinkedIn, YouTube, Facebook, Instagram, and TikTok. The homepage features testimonials from organizations such as INRIA and Spotify (as published on the scikit-learn website), though these are vendor-curated endorsements. Development and maintenance are financially supported by a named set of sponsors listed on the about page.

Community Discussions

Be the first to start a conversation about scikit-learn

Share your experience with scikit-learn, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Completely free and open-source under the BSD 3-Clause license. Free to use, modify, and distribute.

All classification, regression, clustering, and dimensionality reduction algorithms
Model selection and evaluation tools
Preprocessing and feature extraction
Full source code access
BSD 3-Clause license for commercial use

Capabilities

Key Features

Classification algorithms (gradient boosting, random forest, logistic regression, nearest neighbors)
Regression algorithms (ridge, gradient boosting, random forest)
Clustering (k-Means, HDBSCAN, hierarchical clustering)
Dimensionality reduction (PCA, feature selection, NMF)
Model selection (grid search, cross-validation, evaluation metrics)
Preprocessing and feature extraction
Consistent estimator API (fit/predict/transform)
Pipeline composition
BSD 3-Clause open-source license

Integrations

NumPy

SciPy

matplotlib

pandas

Jupyter

API Available

View Docs

Back to all tools Suggest an edit

About scikit-learn

What It Is

Core Capabilities

scikit-learn organizes its functionality into six major areas:

Classification — identifying which category an object belongs to, with algorithms including gradient boosting, nearest neighbors, random forest, and logistic regression; applications include spam detection and image recognition.
Regression — predicting continuous-valued attributes using gradient boosting, nearest neighbors, random forest, ridge regression, and more; applications include drug response modeling and stock price prediction.
Clustering — automatic grouping of similar objects using k-Means, HDBSCAN, hierarchical clustering, and others; applications include customer segmentation.
Dimensionality reduction — reducing the number of variables via PCA, feature selection, and non-negative matrix factorization.
Model selection — comparing, validating, and tuning models through grid search, cross-validation, and evaluation metrics.
Preprocessing — feature extraction and normalization for transforming raw input data (including text) into formats suitable for ML algorithms.

scikit-learn