A curated list of awesome Apache Spark packages and resources.
-
Updated
Apr 8, 2024 - Shell
A curated list of awesome Apache Spark packages and resources.
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
Hands-on workshop with Apache Iceberg
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
GCP_Data_Enginner
Driver/Executor images for spark-operator
Backbone for the MorphL-Community-Edition platform.
Vagrant Box with Python 3.6.1, Apache Spark 2.1.1 with Scala 2.11.8 and PySpark (2.1.1).
Data Warehouse Project - TPC-DS benchmarking on Spark SQL 👨🏻💻
Scalable Spark Docker image that can works on Docker Compose and Kubernetes
P.O.C Spark On Kubernetes
Guide to installing a Hadoop and Spark on an Oracle virtual machine.
Local integration test setup for pyspark with AWS through Localstack
We build a Forex-currency rates pipeline to get currency rates from an external API and load the data into HDFS from where we use pyspark job to massage the data and insert it into a Hive table. The objective of this pipeline is to get the data ready for any downstream machine learning pipeline.
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."