

From Dictionaries to LLMs: Text Analysis in R
This workshop walks through a complete text analysis pipeline in R, combining traditional methods with newer AI-powered tools.
In the first part, we will cover tokenization with tidytext, stopword removal (including custom stopwords), word frequency visualization with wordclouds and bar plots, dictionary-based sentiment analysis using three lexicons (AFINN, Bing, and NRC), topic modeling with LDA, and bigram analysis with word networks.
In the second part, we explore how the mall package can be used to perform sentiment analysis with local LLMs (via Ollama), comparing this approach to dictionary-based methods, and look at related tasks like text classification and entity extraction.
Familiarity with R and the tidyverse is assumed.
Dariia Mykhailyshyna is a postdoctoral researcher in Economics at the Kyiv School of Economics, working in political economy, migration, and causal inference. She holds PhD in Economics from the University of Bologna. She teaches statistics and data science courses, and organizes "Workshops for Ukraine" - a charity R workshop series where registration fees support Ukrainian causes. She has over seven years of experience with R and regularly uses it in her applied research.