Cover Image for Mini-Hackathon: Build a Perception-First Agent
Cover Image for Mini-Hackathon: Build a Perception-First Agent
Avatar for VideoDB
Presented by
VideoDB
Build agents that watch, listen, understand, and recall in real time
6 Going

Mini-Hackathon: Build a Perception-First Agent

Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

About Event

LLMs gave us reasoning. RAG gave us retrieval. Tool calling gave us action. What’s missing in the modern agent stack is perception: the ability to see, hear, and remember the world as it happens.

This workshop is a practical walkthrough of building a perception layer for agents using VideoDB. You’ll learn how to convert continuous media (screen, mic, camera, RTSP, files) into a structured context your agent can use:

  • Indexes (searchable understanding)

  • Events (real-time triggers)

  • Memory (episodic recall with playable evidence)


We’ll implement the core loop:

Continuous Media → Perception Layer (VideoDB) → Agent (reasoning + action) → Output grounded in evidence

Who should attend:

  • Engineers building agents that need continuous and temporal awareness (not one-shot screenshots).

  • Research teams building in physical AI, desktop robots and wearables.

  • Product teams building meeting bots, desktop copilots, monitoring/ops, QA/compliance

  • Founders building multimodal apps where “show me the moment” matters

What You’ll Discover:

  • What “perception” actually means for agents: continuous, temporal, multi-source, searchable, actionable.

  • How to support three input modes with one mental model: files, live streams, desktop capture.About Event

This is a build-first mini-hackathon to ship a working prototype where an agent is no longer blind. You’ll use VideoDB as the perception layer that sits between transport layer and agent logic, converting real-time streams into structured context. Video is no longer a file, it’s multimodal context.

Your prototype must do at least one of these well:

  • Realtime Ingestion : Ingest continuous stream of desktop screen, mic and system audio.

  • Real-time events and alerts (events arriving as the world unfolds, not after processing finishes)

  • Episodic recall (agent can answer “what happened” across time with timestamps with playable moments)

Who should attend:

  • Individuals building monitoring agents, meeting/desktop agents, or multimodal copilots

  • Engineers who want a shippable demo in a few hours

  • Builders who care about outputs grounded in observable evidence

What You Can Build :

  1. Real-Time Watcher Agent: Stream continuously, emit structured events, trigger Slack/webhooks when a condition hits.

  2. Desktop Copilot with Awareness: Capture screen + mic, detect key moments, and generate actions grounded in what was seen and said.

Refer to the docs below to check what’s possible with VideoDB:

Format:

  • Kickoff: perception stack + demo (15–20 min)

  • Build sprint: teams/solo (3-4 hrs)

  • Demos: 3 minutes each (30–45 min)

  • Winners + networking (30-45mins)

What we provide:

  • Starter kit + example pipelines (files/streams/desktop capture)

  • Quick patterns for Indexes, Events, Memory

  • On-site support to unblock teams

Winning Prize:

  • Upto INR 1L  + $500 credits for VideoDB.

Location
Delhi
India
Avatar for VideoDB
Presented by
VideoDB
Build agents that watch, listen, understand, and recall in real time
6 Going