![]() |
Data Engineering Projects With Pyspark (2025)
![]() Data Engineering Projects With Pyspark (2025) Published 5/2025 Created by Chandra Venkat MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch Level: All | Genre: eLearning | Language: English | Duration: 36 Lectures ( 5h 11m ) | Size: 2.65 GB Learn how real data engineers write, deploy, and monitor Spark jobs with Docker, HDFS, Airflow, and production workflows What you'll learn Set up a complete data engineering stack with Docker, Spark, HDFS, and Airflow Build PySpark ETL jobs using DataFrame API and Spark SQL Deploy Spark jobs using spark-submit, cron, and Airflow DAGs Simulate real team workflows with Git branching, handoff, and rollback Organize your project with reusable scripts, env and config files Requirements Basic Python knowledge Familiarity with SQL is helpful No prior experience with Spark, Docker, or Airflow needed - everything is taught from scratch A system with at least 8 GB RAM (Docker is required for project setup) Description Want to break into the world of data engineering using PySpark - but don't want to waste time on abstract theory or outdated tools?This course is built to teach you exactly what real data engineers do on the job.We skip the fluff and dive straight into hands-on, project-based learning where you'll:Set up a full modern data engineering stack using DockerWrite real PySpark ETL jobs using both the DataFrame API and Spark SQLDeploy and monitor your code like professionals - using tools like cron, Airflow, and Spark UIYou'll simulate a real company environment from Day 1. That means:Using Git for branching and code trackingCreating a team-ready folder structure with scripts/, configs/, env shell, and moreLearning how to switch between dev and prod configurationsEven simulating ticket-based deployments, handoffs, and rollback scenariosWhat Makes This Course Different?While most courses focus only on PySpark syntax, this course shows you:Where Spark fits in real-world pipelinesHow to structure your codebase to be reusable and production-friendlyHow to actually deploy jobs using tools like spark-submit, cron jobs, and Airflow DAGsHow to debug and tune Spark jobs using logs, Spark UI, caching, and skew handlingThis isn't just a "learn PySpark" course - this is a "build production data pipelines like a real engineer" course.What Will You Learn?How to build and schedule Spark jobs like a data engineerHow to write clean, modular PySpark code using industry-standard practicesHow to deploy your jobs using cron and Apache AirflowHow to monitor, debug, and optimize jobs using Spark UIHow to use Docker to set up Spark, HDFS, Airflow, and Jupyter - all in one goYou'll complete two real-world projects by the end of the course - both designed to reflect how data teams operate in actual companies.Who Is This Course For?Aspiring data engineers looking for real project experiencePython developers or analysts transitioning into data engineering rolesStudents and freshers looking to build portfolio-ready projectsProfessionals preparing for interviews or on-the-job Spark workAnyone who wants to learn PySpark the practical wayRequirementsBasic Python knowledgeFamiliarity with SQL is helpful (but not required)No prior Spark, Airflow, or Docker experience needed - everything is explained step by stepA system with at least 8 GB RAM (for Docker-based setup)By the end, you'll be confident writing PySpark ETL jobs and deploying them the same way real companies do it in production.This course is not just about learning Spark - it's about learning how to think like a data engineer. Who this course is for Aspiring data engineers who want hands-on, production-style experience Python developers or analysts transitioning into data engineering roles Students and self-learners building portfolio-ready PySpark projects Professionals preparing for Spark-based roles in real companies Цитата:
|
Часовой пояс GMT +3, время: 04:31. |
vBulletin® Version 3.6.8.
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Перевод: zCarot