Udemy – Scrapy masterclass: Python web scraping and data pipelines 2022-11

Published on: 2022-12-12 03:06:58

Categories: 28

Description

Scrapy masterclass: Python web scraping and data pipelines Scrapy Masterclass course: Python web scraping and data pipelines published by Udemy Academy. Work on 7 real-world web scraping projects using Scrapy, Splash, and Selenium. Build data pipelines locally and on AWS.

Everyone tells you what to do with the data you already have. But how can you “own” this data? Most discussions of data engineering/data science today focus on how to analyze and process data sets to extract useful information from them. However, they all assume that those datasets are already available to you. which are gathered in some way. They spend quite a bit of time showing you how you can get your hands on this dataset! This course fills this gap. Scrapy is all about setting you up in the process of extracting data of interest from websites to create powerful web scraping pipelines. That’s right, there are tons of data sets available to you right now that you can consume for free or for a fee. However, what if those datasets are out of date? What if they don’t meet your specific needs? It’s best to know how to build your dataset from scratch, no matter how unstructured your data source is.

Scrapy is a Python web scraping framework. Thousands of companies and professionals use it to collect data and build datasets. They can then sell them or use them in their own projects. Today, you can be one of those professionals. Even build your own business based on data collection! Today, data scientists and data engineers are among the highest paid in the industry. However, they can’t do anything if they don’t have enough data to work on. In this class, I’ll show you how to capture, organize, and store unstructured data from within HTML, CSS, and JavaScript websites. By mastering this skill, you can start your data engineering/data science career with an additional skill set under your belt: web scraping. You will also learn the next steps after obtaining your information. ETL (Extract, Transform, and Load) starts with Scrapy (Extract). But this course covers two other aspects (Transform and Load). Using Scrapy pipelines, we’ll see how we can store our data in SQL and NoSQL databases, Elasticsearch clusters, event brokers like Kafka, object storage like S3, and message queues like AWS SQS. Even if you don’t know anything about web scraping or data collection, even if this all sounds new to you, you’ve come to the right place.

What you will learn in the Scrapy masterclass: Python web scraping and data pipelines course:

Extract data from the most difficult websites using Scrapy
Build ETL pipelines and store data in CSV, JSON, MySQL, MongoDB and S3.
Avoid getting banned and avoid bot protection techniques.
Use Splash to scrape JavaScript enabled websites.
Use the power of Selenium browser automation to scrape any website.
Deploy your Scrapy bots in local and AWS environments.

Who is this course suitable for:

Anyone who wants to automate data collection from websites (web scraping) using Scrapy.
Anyone who wants to build a business around data collection and web scraping.
Data engineers, data scientists, ML engineers who want to master web scraping for their data collection needs.
Developers, DevOps engineers or IT professionals who want to change careers to data engineering.
Python programmers who want to learn more about Scrapy or web scraping in general.

Course specifications

Publisher: Udemy
Instructor: Ahmed Elfakharany
Language: English
Training level: Introductory to Advanced
Number of courses: 40
Training duration: 5 hours and 44 minutes

Course topics on 2022/12

Scrapy masterclass: Python web scraping and data pipelines

Course prerequisites

Some Python background
All projects are run on Python 3.10 so it needs to be installed
Familiarity with Linux is recommended but not strictly required
Familiarity with the HTTP protocol and HTML