whylogs icon

whylogs

whylogs is an open-source data logging library developed by WhyLabs that enables comprehensive logging and profiling of datasets across machine learning pipelines and data workflows. As the foundational component of the WhyLabs AI Observability Platform, whylogs provides a lightweight, scalable, and privacy-preserving approach to capturing statistical profiles of data, which serve as the basis for monitoring, validation, and analysis.

At its core, whylogs generates statistical summaries of datasets (called "profiles") that capture key information about the distributions and characteristics of the data without storing the raw data itself. These profiles include metrics such as distributions, shapes, types, missing values, and other statistical properties that are essential for understanding data quality and model behavior. By focusing on statistical properties rather than raw data, whylogs enables effective monitoring while maintaining privacy and minimizing storage requirements.

One of the key strengths of whylogs is its flexibility and versatility. The library can handle both structured and unstructured data, including tabular datasets, images, text, audio, and other complex data types. It supports both batch processing for large datasets and streaming for real-time applications, making it suitable for a wide range of use cases. whylogs can be easily integrated into various ML infrastructures and workflows, with support for popular frameworks and platforms such as Pandas, Apache Spark, AWS SageMaker, MLflow, Flask, Ray, RAPIDS, and Apache Kafka.

whylogs profiles are designed to be lightweight, customizable, and mergeable. The lightweight nature ensures minimal overhead in production systems, while customizability allows users to define and track specific metrics relevant to their applications. The mergeability feature is particularly valuable for distributed environments, as it allows profiles from different parts of a system to be combined into a unified view.

The profiles generated by whylogs can be output in various formats, including Protobuf (a lightweight binary format), JSON, and flat files with CSV and JSON content. These formats enable easy integration with visualization tools and platforms. Users can leverage these profiles for various purposes, including:

  1. Tracking changes in datasets over time to detect data drift and concept drift
  2. Creating data constraints to validate whether data meets quality expectations
  3. Visualizing key summary statistics for exploratory data analysis
  4. Detecting training-serving skew and model performance degradation
  5. Enabling data auditing and governance across an organization
  6. Standardizing data documentation practices

For organizations using the WhyLabs Platform, whylogs profiles can be seamlessly uploaded to provide comprehensive observability, monitoring, and alerting capabilities. However, whylogs is also valuable as a standalone library for teams that want to implement their own monitoring and analysis workflows.

As an open-source project, whylogs benefits from contributions from a diverse community of data scientists, ML engineers, and developers. The project is actively maintained and updated with new features and improvements, ensuring that it continues to address the evolving needs of AI teams working on increasingly complex data and model ecosystems.

No discussions yet

Be the first to start a discussion about whylogs

Developer

No developer information available.