Apache Livy
Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It allows submitting Spark jobs or snippets of Spark code, retrieving results synchronously or asynchronously, and managing Spark contexts across multiple users. Licensed under Apache 2.0.
APIs
Apache Livy REST API
The Livy REST API provides endpoints for creating and managing interactive Spark sessions, submitting batch Spark jobs, executing code statements (Python, Scala, R, SQL), and re...
Capabilities
Apache Livy Spark Job Management
Workflow capability for data engineers and data scientists to manage interactive Spark sessions and submit batch Spark jobs via Apache Livy REST API.
Run with NaftikoFeatures
Create persistent Spark contexts for interactive code execution in Python, Scala, R, and SQL.
Submit batch Spark jobs without creating an interactive session.
Execute code in PySpark, Spark (Scala), SparkR, and SQL.
Proxy user support for multi-tenant Spark cluster access.
Submit jobs and poll for results asynchronously.
Retrieve driver and executor logs for debugging.
Simple HTTP REST API for Spark cluster interaction without native clients.
Use Cases
Power Jupyter, Zeppelin, and other notebooks with Spark backends via Livy.
Submit Spark batch jobs from orchestration tools like Airflow and Oozie.
Execute ad-hoc Spark code for exploratory data analysis.
Enable multiple users to share a Spark cluster with isolation via Livy sessions.
Integrations
Livy requires a Spark cluster and acts as the REST gateway to Spark.
Zeppelin notebook backend using Livy for distributed Spark execution.
Jupyter sparkmagic extension uses Livy for remote Spark kernel access.
Airflow LivyOperator for submitting Spark batch jobs from DAGs.
Livy is available as an EMR application for REST-based Spark access.