Published on: 2022-12-12 03:06:58
Categories: 28
Share:
Scrapy masterclass: Python web scraping and data pipelines Scrapy Masterclass course: Python web scraping and data pipelines published by Udemy Academy. Work on 7 real-world web scraping projects using Scrapy, Splash, and Selenium. Build data pipelines locally and on AWS.
Everyone tells you what to do with the data you already have. But how can you “own” this data? Most discussions of data engineering/data science today focus on how to analyze and process data sets to extract useful information from them. However, they all assume that those datasets are already available to you. which are gathered in some way. They spend quite a bit of time showing you how you can get your hands on this dataset! This course fills this gap. Scrapy is all about setting you up in the process of extracting data of interest from websites to create powerful web scraping pipelines. That’s right, there are tons of data sets available to you right now that you can consume for free or for a fee. However, what if those datasets are out of date? What if they don’t meet your specific needs? It’s best to know how to build your dataset from scratch, no matter how unstructured your data source is.
Scrapy is a Python web scraping framework. Thousands of companies and professionals use it to collect data and build datasets. They can then sell them or use them in their own projects. Today, you can be one of those professionals. Even build your own business based on data collection! Today, data scientists and data engineers are among the highest paid in the industry. However, they can’t do anything if they don’t have enough data to work on. In this class, I’ll show you how to capture, organize, and store unstructured data from within HTML, CSS, and JavaScript websites. By mastering this skill, you can start your data engineering/data science career with an additional skill set under your belt: web scraping. You will also learn the next steps after obtaining your information. ETL (Extract, Transform, and Load) starts with Scrapy (Extract). But this course covers two other aspects (Transform and Load). Using Scrapy pipelines, we’ll see how we can store our data in SQL and NoSQL databases, Elasticsearch clusters, event brokers like Kafka, object storage like S3, and message queues like AWS SQS. Even if you don’t know anything about web scraping or data collection, even if this all sounds new to you, you’ve come to the right place.
Some Python background
All projects are run on Python 3.10 so it needs to be installed
Familiarity with Linux is recommended but not strictly required
Familiarity with the HTTP protocol and HTML
After Extract, watch with your favorite Player.
English subtitle
Quality: 720p
2.85 GB
Sharing is caring: