Apache SystemDS logo

Apache SystemDS

Apache SystemDS is an open-source ML system for the end-to-end data science lifecycle from data integration, cleaning, and feature engineering to model training, debugging, and deployment. It provides a declarative machine learning language (DML), automatic optimization for different execution backends (local, distributed Spark), and a Python API (SystemDS Python). SystemDS is an Apache Software Foundation top-level project designed for scalable ML workflows.

1 APIs 6 Features
AutoMLData ScienceDistributed ComputingMachine LearningOpen Source

APIs

Apache SystemDS Python API

The SystemDS Python API (systemds) provides a Python interface for building end-to-end ML pipelines. It includes Matrix and Frame types for distributed data manipulation, built-...

Features

Declarative ML Language (DML)

High-level R-like language for specifying ML algorithms with automatic optimization.

Automatic Optimization

Query optimization, memory management, and execution plan selection for ML workloads.

Federated Learning

Privacy-preserving federated ML across distributed data silos without data sharing.

Built-In Algorithms

50+ built-in ML algorithms including linear models, neural networks, clustering, and ensemble methods.

Python API

Pythonic API for ML pipeline development with lazy evaluation and distributed execution.

Data Cleaning Pipelines

Automated data cleaning, imputation, encoding, and normalization pipelines.

Use Cases

Distributed ML Training

Train large-scale ML models distributed across Apache Spark clusters.

Federated Machine Learning

Cross-silo federated learning for privacy-sensitive healthcare and finance data.

End-to-End ML Pipelines

Integrated data preparation, feature engineering, training, and serving pipelines.

Integrations

Apache Spark

Native Spark backend for distributed matrix operations and ML training.

Python

Python API with NumPy-compatible Matrix type for local and distributed computation.

Kubernetes

Kubernetes deployment support for SystemDS runtime via Helm charts.

Resources

👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
🌐
Portal
Portal
🚀
GettingStarted
GettingStarted
📄
ReleaseNotes
ReleaseNotes
📜
TermsOfService
TermsOfService