Scraping Smarter with Content Filtering

Scrapy is a powerful web scraping framework and essential tool for building machine learning datasets. When a site has a particularly simple structure, scrapy makes it easy to get a spider running to build up a curated dataset. Check out the tutorials in scrapy’s documentation. For example, to train a poster similarity model, we first needed to gather many movie posters. Consider trying to scrape IMDb.com.We may be interested in gathering posters from <img> tags under <div> tags with the class "poster"....

 · 3 min · Terry Rodriguez & Salma Mayorquin