

Open Web Data: Scraping 101
This session is part of the Introduction to Political Technology course at Newspeak House, open to faculty and fellowship candidates only.
Learn the fundamentals of extracting data from the web. We'll start with how the web actually works — HTTP requests, responses, status codes, and headers — then move into hands-on scraping with the Requests library for simple, fast data retrieval, along with tools like lxml and beautifulsoup for parsing HTML.
From there, we'll tackle dynamic, JavaScript-heavy sites using browser automation tools like Playwright. Finally, we'll explore the Chrome DevTools Protocol (CDP), the lower-level interface that powers these tools, and how you can use it directly for fine-grained control over browser behavior, network interception, and stealthier scraping.
No prior scraping experience required — just bring a laptop and some curiosity.