whylogs

Name: whylogs
Availability: OnlineOnly
Author: WhyLabs

AI Development Libraries

Open-source data logging library for AI observability and data quality monitoring

Visit Website

At a Glance

Pricing

Free tier available

Get started with whylogs at no cost with Free version available.

Engagement

8views

0saves

0discussions

Available On

SDK

Resources

Website Docs GitHub llms.txt

Topics

AI Development Libraries Monitoring Tools API Integration Platforms

About whylogs

whylogs is an open-source data logging library developed by WhyLabs that enables comprehensive logging and profiling of datasets across machine learning pipelines and data workflows. As the foundational component of the WhyLabs AI Observability Platform, whylogs provides a lightweight, scalable, and privacy-preserving approach to capturing statistical profiles of data, which serve as the basis for monitoring, validation, and analysis.

At its core, whylogs generates statistical summaries of datasets (called "profiles") that capture key information about the distributions and characteristics of the data without storing the raw data itself. These profiles include metrics such as distributions, shapes, types, missing values, and other statistical properties that are essential for understanding data quality and model behavior. By focusing on statistical properties rather than raw data, whylogs enables effective monitoring while maintaining privacy and minimizing storage requirements.

One of the key strengths of whylogs is its flexibility and versatility. The library can handle both structured and unstructured data, including tabular datasets, images, text, audio, and other complex data types. It supports both batch processing for large datasets and streaming for real-time applications, making it suitable for a wide range of use cases. whylogs can be easily integrated into various ML infrastructures and workflows, with support for popular frameworks and platforms such as Pandas, Apache Spark, AWS SageMaker, MLflow, Flask, Ray, RAPIDS, and Apache Kafka.

whylogs profiles are designed to be lightweight, customizable, and mergeable. The lightweight nature ensures minimal overhead in production systems, while customizability allows users to define and track specific metrics relevant to their applications. The mergeability feature is particularly valuable for distributed environments, as it allows profiles from different parts of a system to be combined into a unified view.

The profiles generated by whylogs can be output in various formats, including Protobuf (a lightweight binary format), JSON, and flat files with CSV and JSON content. These formats enable easy integration with visualization tools and platforms. Users can leverage these profiles for various purposes, including:

Tracking changes in datasets over time to detect data drift and concept drift
Creating data constraints to validate whether data meets quality expectations
Visualizing key summary statistics for exploratory data analysis
Detecting training-serving skew and model performance degradation
Enabling data auditing and governance across an organization
Standardizing data documentation practices

For organizations using the WhyLabs Platform, whylogs profiles can be seamlessly uploaded to provide comprehensive observability, monitoring, and alerting capabilities. However, whylogs is also valuable as a standalone library for teams that want to implement their own monitoring and analysis workflows.

As an open-source project, whylogs benefits from contributions from a diverse community of data scientists, ML engineers, and developers. The project is actively maintained and updated with new features and improvements, ensuring that it continues to address the evolving needs of AI teams working on increasingly complex data and model ecosystems.

Community Discussions

Be the first to start a conversation about whylogs

Share your experience with whylogs, ask questions, or help others learn from your insights.

Pricing

FREE

Free Plan Available

Get started with whylogs at no cost with Free version available.

Free version available

View official pricing

Capabilities

Key Features

Privacy-preserving statistical profiles of datasets
Support for structured and unstructured data
Batch and streaming data processing
Lightweight and efficient data logging
Customizable metrics and statistics
Mergeable profiles for distributed systems
Multiple output formats (Protobuf, JSON, flat files)
Integration with popular ML frameworks and tools
Open-source with Apache 2.0 license
Scalable for large datasets and high-volume applications

Integrations

Python

Java

Pandas

Apache Spark

AWS SageMaker

MLflow

Flask

Ray

RAPIDS

Apache Kafka

WhyLabs Platform

Back to all tools

whylogs

At a Glance

Pricing

Engagement

Available On

Resources

Topics

About whylogs

Community Discussions

Be the first to start a conversation about whylogs

Pricing

Free Plan Available

Capabilities

Key Features

Integrations

WhyLabs

Itzam

Superpowers