Cover Image for AIOps for SRE: How AI is Redefining Operations and Reliability Practices
Cover Image for AIOps for SRE: How AI is Redefining Operations and Reliability Practices
Avatar for Open Source for AI
Presented by
Open Source for AI
Providing all developers the resources to understand, use, and contribute to the development and direction of AI
Hosted By

AIOps for SRE: How AI is Redefining Operations and Reliability Practices

YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Sponsored by Cracking Gen AI

This week's speaker will be Arun Pandiyan Perumal, a Site Reliability Engineer at Adobe with more than 10 years of technical experience.

Abstract

Site Reliability Engineering has always been about reducing toil, improving reliability, and helping teams respond faster when systems fail. AIOps is now changing how that work gets done by bringing AI-driven insights, automation, and closed-loop remediation into day-to-day operations. This session focuses on how AIOps is transforming the way SRE teams detect, understand, and respond to reliability issues before they become customer-impacting incidents, and how SRE teams can use AIOps to move beyond reactive monitoring and manual incident response toward intelligent, policy-driven operations. Attendees will learn how AI can help detect early signs of service degradation, correlate signals across logs, metrics, traces, events, and configuration changes, accelerate root-cause analysis, and recommend or trigger safe remediation actions. They will also explore how AIOps can minimize alert fatigue, detect early warning signs before outages occur, accelerate incident response, and turn operational data into actionable insights to improve reliability. The talk will also cover practical considerations for adopting AI responsibly, including data quality, model trust, human-in-the-loop decision-making, governance, and the importance of clear operational guardrails. A key takeaway from the session is the practical understanding of where AIOps delivers the most value, how to integrate it into existing observability and incident management workflows, and how to measure its impact using reliability-focused outcomes such as reduced MTTR, lower toil, improved signal quality, and stronger service resilience.

Anticipated Agenda

6:00 - 6:05: Introduction

6:05 - 6:45: Presentation

6:45 - 7:00: Q/A

Avatar for Open Source for AI
Presented by
Open Source for AI
Providing all developers the resources to understand, use, and contribute to the development and direction of AI
Hosted By