

AI Journal Club ft. Yuzong Chen (Cornell PhD Research)
Join Workato's AI Journal Club series—we're bringing together the best AI researchers to share papers and exchange perspectives on how AI research is shaping real world systems.
5:30–6:00 PM: Check-in and registration
6:00–6:15 PM: Welcome to Workato
6:15–6:45 PM: Talk by Yuzong Chen, Cornell Research
6:45–7:00 PM: Q&A
7:00–8:00 PM: Networking
Please arrive by 6:00 PM. We politely ask that attendees arrive by this time out of respect for our speaker.
Featured Speaker
Yuzong Chen is a final-year PhD student in the School of Electrical and Computer Engineering at Cornell Tech, advised by Prof. Mohamed Abdelfattah. His research focus on Algorithm-Hardware Co-Design for Machine Learning Acceleration, with a special focus on quantization numerics, FPGA architectures, and processing in-memory. He was named a ML and Systems Rising Star by MLCommons in 2026 and was a finalist for the 2024 Qualcomm Innovation Fellowship.
Efficient Algorithm-Hardware Co-Design Methodology for Quantized LLM Acceleration
As the silicon technology approaches the Post-Moore’s Law Era, hardware specialization has become increasingly prevalent to drive machine learning applications based on deep neural networks (DNNs). Furthermore, as DNN comes to the era of large language models (LLMs), the growth of model size continues to outpace the scaling of compute performance and memory capacity in existing hardware platforms. For example, the first generation of the GPT model, introduced in 2018, contains only 117 million parameters, while the second and third generations grew more than 10× and 1000×, respectively, within two years. On the other hand, compute performance and DRAM bandwidth typically increase by only ∼2× every two years, imposing a significant bottleneck for efficient LLM deployment, particularly in edge scenarios with limited hardware resources.
My research aims to improve the accessibility of machine learning applications through algorithm-hardware codesign on field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). First, I enhance the compute throughput of FPGAs for DNN acceleration by augmenting the on-chip block memory to support low-precision MAC operations. Second, I design three ASIC accelerators that exploit novel bit-level sparsity and mixed-precision quantization algorithms to reduce the cost of DNN inference. In this talk, I will primarily discuss two of my works, BitMoD and P3-LLM, as an example to demonstrate how algorithms and accelerators can be co-designed for efficient low-precision LLM inference. Collectively, I hope that my work can pave a promising path toward the efficient deployment of machine learning workloads on future accelerator platforms.
Who Should Attend
AI Researchers and practitioners working at the intersection of AI research and real world systems.
About Workato
Workato is the Enterprise MCP company, providing the connective layer that gives AI agents secure, governed access to enterprise systems and data. Built on a decade of integration expertise spanning 14,000+ applications, Workato's platform enables organizations to move from simple automation to agentic AI that can reason, act, and orchestrate work across the entire business. You can explore Workato's end-to-end capabilities in our developer sandbox here.