Cover Image for Build a Fault-Tolerant YouTube Deep-Research Q&A Agent with Temporal
Cover Image for Build a Fault-Tolerant YouTube Deep-Research Q&A Agent with Temporal
Avatar for DataTalks.Club events
DataTalks.Club is a global online community of people who love data.
154 Going

Build a Fault-Tolerant YouTube Deep-Research Q&A Agent with Temporal

YouTube
Registration
Welcome! To join the event, please register below.
About Event

In this workshop, you will learn how to create a durable research agent that can answer questions about any YouTube video, using DataTalksClub's podcast collection as the example dataset.

The focus will be on building a reliable end-to-end system: robust transcript ingestion, searchable indexing, and a durable deep-research agent powered by Temporal, Elasticsearch, and PydanticAI.

1. Ingesting and Stabilizing YouTube Transcripts

We begin by collecting transcripts from every podcast episode on the channel.

YouTube transcript retrieval is unstable by nature:

  • Transcripts may be missing or temporarily unavailable

  • We have rate limits

  • Proxies to bypass rate limits can be unreliable

You'll see these failure patterns and learn how to turn an unreliable data-fetching script into a Temporal workflow.

We'll implement retries, backoff, proxy rotation, and failure handling so the ingestion continues even when the environment is unreliable.

Once transcripts are consistently fetched, we'll index them in Elasticsearch for fast, structured retrieval.

2. Building the Deep-Research Agent

With a searchable index in place, we'll build an agent that can answer grounded questions about any indexed video.

Using PydanticAI, we'll define two core tools, a search tool and a file-retrieval tool, that allow the agent to locate relevant transcript segments and use them to generate accurate, source-based answers.

We then run the agent through Temporal, which gives us:

  • Durability for long-running queries

  • Clear workflow histories

  • Built-in logs and observability

  • Reproducible agent behavior

This creates an agent that is easy to inspect, debug, and keep in production.

By the end of the workshop, you’ll have a complete pattern for building a YouTube question-answering system with:

  • A stable ingestion pipeline

  • A searchable transcript index

  • A durable research agent on top of it

The approach generalizes to any YouTube channel or internal video library that needs reliable, auditable AI-powered research.


About the speaker:

Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.

Alexey is a seasoned software and ML engineer with over 10 years in engineering and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS'17 Criteo Challenge.


​​​​​​DataTalks.Club is the place to talk about data. Join our Slack community!

This event is sponsored by Temporal.

Avatar for DataTalks.Club events
DataTalks.Club is a global online community of people who love data.
154 Going