The Toronto ML Systems Group: RL Infra, Hardware-aware algorithms
The Toronto ML Systems Group is back. This time we will be covering topics like RL infrastructure and hardware-aware algorithms (FlashAttention).
For our first speaker, we will have Kimbo Chen from SemiAnalysis calling in from New York to talk about his article on RL Systems (https://newsletter.semianalysis.com/p/rl-systems-mind-the-gap-matching)
This will be followed by a deep dive led by Melani that looks into the ideas behind FlashAttention 1 through 4 and the design choices that make them fast on NVIDIA GPUs. There will be slides but expect discussion, questions, and comparing notes on the papers. We expect this to be high-level. The intention here is prep for potential hands-on workshops that look into designing hardware-aware algorithms.
This is the third session of the Toronto ML Systems group, a recurring meetup for those who want to get serious about the full stack, from chips and ML compilers to kernel writing, model architectures, distributed training, and everything in between.
Spots will be limited, and priority will go to people who've taken a look at the readings/papers in advance. Come ready with a working understanding of the material and questions/opinions.
If you're curious about RL systems and designing algorithms that work with the hardware instead of against it, this should be an exciting evening.
Let's see who's who.