Cover Image for AI Journal Club ft. Yuzong Chen (Cornell PhD Research)
Cover Image for AI Journal Club ft. Yuzong Chen (Cornell PhD Research)
Avatar for Workato Developer Events
Join us at our AI Hub in San Francisco.

AI Journal Club ft. Yuzong Chen (Cornell PhD Research)

Register to See Address
San Francisco, CA
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

Join Workato's AI Journal Club series—we're bringing together the best AI researchers to share papers and exchange perspectives on how AI research is shaping real world systems.

5:30–6:00 PM: Check-in and registration
6:00–6:15 PM: Welcome to Workato  
6:15–6:45 PM: Talk by Yuzong Chen, Cornell Research
6:45–7:00 PM: Q&A
7:00–8:00 PM: Networking

Please arrive by 6:00 PM. We politely ask that attendees arrive by this time out of respect for our speaker.

Featured Speaker

Yuzong Chen is a final-year PhD student in the School of Electrical and Computer Engineering at Cornell Tech, advised by Prof. Mohamed Abdelfattah. His research focus on Algorithm-Hardware Co-Design for Machine Learning Acceleration, with a special focus on quantization numerics, FPGA architectures, and processing in-memory. He was named a ML and Systems Rising Star by MLCommons in 2026 and was a finalist for the 2024 Qualcomm Innovation Fellowship.

Efficient Algorithm-Hardware Co-Design Methodology for Quantized LLM Acceleration

As the silicon technology approaches the Post-Moore’s Law Era, hardware specialization has become increasingly prevalent to drive machine learning applications based on deep neural networks (DNNs). Furthermore, as DNN comes to the era of large language models (LLMs), the growth of model size continues to outpace the scaling of compute performance and memory capacity in existing hardware platforms. For example, the first generation of the GPT model, introduced in 2018, contains only 117 million parameters, while the second and third generations grew more than 10× and 1000×, respectively, within two years. On the other hand, compute performance and DRAM bandwidth typically increase by only ∼2× every two years, imposing a significant bottleneck for efficient LLM deployment, particularly in edge scenarios with limited hardware resources. 

My research aims to improve the accessibility of machine learning applications through algorithm-hardware codesign on field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). First, I enhance the compute throughput of FPGAs for DNN acceleration by augmenting the on-chip block memory to support low-precision MAC operations. Second, I design three ASIC accelerators that exploit novel bit-level sparsity and mixed-precision quantization algorithms to reduce the cost of DNN inference. In this talk, I will primarily discuss two of my works, BitMoD and P3-LLM, as an example to demonstrate how algorithms and accelerators can be co-designed for efficient low-precision LLM inference. Collectively, I hope that my work can pave a promising path toward the efficient deployment of machine learning workloads on future accelerator platforms.

Who Should Attend

AI Researchers and practitioners working at the intersection of AI research and real world systems.

About Workato

​Workato is the Enterprise MCP company, providing the connective layer that gives AI agents secure, governed access to enterprise systems and data. Built on a decade of integration expertise spanning 14,000+ applications, Workato's platform enables organizations to move from simple automation to agentic AI that can reason, act, and orchestrate work across the entire business. You can explore Workato's end-to-end capabilities in our developer sandbox here.

Location
Please register to see the exact location of this event.
San Francisco, CA
Avatar for Workato Developer Events
Join us at our AI Hub in San Francisco.