Apache PySpark
Python API for Apache Spark - A unified analytics engine for large-scale data processing supporting batch processing, streaming, machine learning, and graph computing.
5 APIs
0 Features
Big DataData ProcessingDistributed ComputingMachine LearningPythonStreaming
APIs
PySpark Core API
Core Spark functionality including RDDs, SparkContext, and basic operations.
PySpark SQL
Structured data processing with DataFrame and SQL operations.
PySpark Streaming
Real-time stream processing capabilities using DStreams and Structured Streaming.
PySpark MLlib
Machine learning library with scalable algorithms for classification, regression, clustering, and more.
PySpark ML (DataFrame-based)
DataFrame-based machine learning API with pipelines and feature transformers.
Resources
🔗
Website
Website
👥
GitHubOrganization
GitHubOrganization
🚀
GettingStarted
GettingStarted
🚀
QuickStart
QuickStart
🔗
Downloads
Downloads
🔗
Community
Community
🔗
IssueTracker
IssueTracker
📄
ReleaseNotes
ReleaseNotes
🔗
Security
Security