We will begin with building our first crawler. The goal is that the crawler takes a static web page as … Mehr
Autor: Jens Förderer
What is..? HTML
To understand web scraping, we need to understand how web pages work. It is time to look a bit more … Mehr
Quick tips: How to avoid getting blocked
Getting blocked from crawling can happen very fast. Maybe we exceed the maximum number of requests, maybe we send too … Mehr
Crawling 104: Retrieving data from APIs
Next up, we will be dealing with probably the most convenient way of retrieving data for our research projects, that … Mehr
What is..? Data formats explained
Data formats can be confusing. CSV, JSON, TXT, and many more three or four-letter acronyms are out there. Time to … Mehr
Prepare: Creating a data dictionary
Before we start collecting data, we should also draft a first version of our data dictionary. A data dictionary is … Mehr
Getting started: Crawler, Scraper–what’s the difference?
We also need to deal with some terminology. Crawler and scraper. Let’s sort this out right away. There is not … Mehr
Pro tip: Get notified by email when the crawler has done its job
Crawling can take hours or days to complete. Sometimes crawling is done regularly, say, once a week or even daily. … Mehr
Quick tips: Friendly crawling
Data should be crawled responsibly so that it does not have a detrimental effect on the web site being scraped. … Mehr
Crawling 102: Collecting web data with Selenium and Python
Selenium is a powerful tool for collecting web-data. With the help of Selenium, we can collect data from pages that … Mehr