data-engineering

PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.

python data-science apache-spark pyspark data-engineering data-analysis python-tutorial pyspark-tutorial spark-tutorials

Updated Oct 8, 2023
Jupyter Notebook

ml6team / fondant-usecase-RAG

Star

Data pipelines and notebooks for RAG tuning using Fondant

data-engineering rag weaviate fondant

Updated Mar 17, 2024
Jupyter Notebook

Aiscalate / aiscalator

Star

Tools to streamline Jupyter Notebook Prototypes into robust Data Products

data-science airflow jupyter jupyter-notebook data-engineering jupyterlab airflow-docker

Updated Dec 26, 2022
Python

wednesday-solutions / aws-glue-jupyter-notebook-starter

Star

A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally

aws jupyter etl glue data-engineering de aws-glue jupyter-notbook

Updated Sep 6, 2023
Jupyter Notebook

SoleyIo / dtflw

Star

dtflw is a Python framework for building modular data pipelines based on Databricks dbutils.notebook API.

framework data-engineering databricks etl-pipeline pyhotn

Updated Oct 29, 2023
Python

bdist / notebook

Star

Jupyter Notebook Databases Stack

python docker postgres data-science sql jupyter notebook sqlite postgresql data-engineering jupyterlab

Updated May 29, 2024
Makefile

BinariesGoalls / Udacity-Data-Engineering-Nanodegree

Star

This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.

python aws postgres airflow spark cassandra etl data-engineering data-pipelines data-modeling data-warehouses data-lakes

Updated Dec 5, 2022
PLpgSQL

jonyroy / data-engineering-notebook

Star

python docker kubernetes scala kafka spark data-engineering

Updated Aug 28, 2022
Scala

DenisOgr / kaggle-notebook-to-production

Star

This is a study project. I get analytics/ML examples from Kaggle and use different technologies to re-implement them.

python bigquery spark gcp data-engineering kaggle-competition kaggle-dataset

Updated May 25, 2021
Jupyter Notebook

bdist / bdist-workspace

Star

This repository provides containerized applications and microservices for the Information Systems and Databases Course @ Instituto Superior Técnico

python docker postgres data-science sql jupyter notebook sqlite postgresql data-engineering jupyterlab

Updated May 17, 2024

mar1boroman / databricks-patterns

Star

Common ETL patterns and utilities for PySpark. Notebooks tested on Databricks Community edition

data-science spark etl pyspark data-engineering databricks etl-framework cloud-migration databricks-notebooks databricks-email databricks-etl

Updated Sep 3, 2022
Jupyter Notebook

EtienneEs / Postgres-Jupyter-Notebook-Microservice-Template

Star

Postgres & Jupyter Notebook MicroserviceTemplate for DataScience / DataEngineering tryouts

docker postgres docker-compose jupyter-notebook python3 data-engineering

Updated Aug 16, 2021
Jupyter Notebook

garthajon / DataScienceColabRepo

Star

A repository of notebooks and data sources for data engineers, data analysts and data scientists, chiefly proof of concept level

data-science data spark data-engineering spark-nlp

Updated May 23, 2024
Jupyter Notebook

codeXXripper / Spotify--Data-Pipeline

Star

Spotify Data Engineering Project

python notebook data-engineering data-pipeline spotify-web-api

Updated Jan 4, 2024
Jupyter Notebook

paladique / codespaces-etl-basic-demo

Star

ETL with Jupyter Notebooks, Pandas, and Azure Cosmos DB

etl azure pandas data-engineering azure-cosmos-db codespaces

Updated Oct 5, 2023
Jupyter Notebook

ramkumarpj / project-three

Star

SEC Finance Data Engineering - ETL process for SEC Finance data of S&P 500 companies. Jupyter Notebooks to run ETL work flows. The final dataset is hosted in MongoDB Atlas(cloud). The API is written using Python with PyMongo and Flask libraries. The dashboards with charts are hosted in MongoDB Atlas.

python flask mongodb etl pymongo jupyter-notebook pandas data-engineering beautifulsoup extract-transform-load mongodb-atlas mongodb-atlas-cloud

Updated Mar 5, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-engineering

Here are 38 public repositories matching this topic...

sernst / cauldron

Snowflake-Labs / snowflake-demo-notebooks

ml4devs / ml4devs-notebooks

ploomber / soorgeon

coder2j / pyspark-tutorial

ml6team / fondant-usecase-RAG

Aiscalate / aiscalator

wednesday-solutions / aws-glue-jupyter-notebook-starter

SoleyIo / dtflw

bdist / notebook

BinariesGoalls / Udacity-Data-Engineering-Nanodegree

jonyroy / data-engineering-notebook

DenisOgr / kaggle-notebook-to-production

bdist / bdist-workspace

mar1boroman / databricks-patterns

EtienneEs / Postgres-Jupyter-Notebook-Microservice-Template

garthajon / DataScienceColabRepo

codeXXripper / Spotify--Data-Pipeline

paladique / codespaces-etl-basic-demo

ramkumarpj / project-three

Improve this page

Add this topic to your repo