Interactive computing for complex data processing, modeling and analysis in Python 3
-
Updated
May 3, 2024 - Python
Interactive computing for complex data processing, modeling and analysis in Python 3
Collection of Snowflake Notebook demos, tutorials, and examples
Companion notebooks for blogs/tutorials on ML4Devs website.
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.
Data pipelines and notebooks for RAG tuning using Fondant
Tools to streamline Jupyter Notebook Prototypes into robust Data Products
A starter repository for your next AWS Glue project. This comes with complete IaC, a CD pipeline and a reusable common SDK. Set up jupyter notebook for AWS Glue locally
dtflw is a Python framework for building modular data pipelines based on Databricks dbutils.notebook API.
Jupyter Notebook Databases Stack
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
This is a study project. I get analytics/ML examples from Kaggle and use different technologies to re-implement them.
This repository provides containerized applications and microservices for the Information Systems and Databases Course @ Instituto Superior Técnico
Common ETL patterns and utilities for PySpark. Notebooks tested on Databricks Community edition
Postgres & Jupyter Notebook MicroserviceTemplate for DataScience / DataEngineering tryouts
A repository of notebooks and data sources for data engineers, data analysts and data scientists, chiefly proof of concept level
Spotify Data Engineering Project
ETL with Jupyter Notebooks, Pandas, and Azure Cosmos DB
SEC Finance Data Engineering - ETL process for SEC Finance data of S&P 500 companies. Jupyter Notebooks to run ETL work flows. The final dataset is hosted in MongoDB Atlas(cloud). The API is written using Python with PyMongo and Flask libraries. The dashboards with charts are hosted in MongoDB Atlas.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."