Program at a Glance

01:00 PM to 02:45 PM
PRO PDSA1: Memory-Centric Computing (Part 1): Fundamental Techniques
Track: Professional Development Series (Pre-Conference)
Speakers:
Onur Mutlu, Professor of Computer Science, ETH Zurich
Onur Mutlu is a Professor of Computer Science at ETH Zurich. He previously held the William D. and Nancy W. Strecker Early Career Professorship at Carnegie Mellon University and was a Visiting Professor at Stanford University. He started the Computer Architecture Group at Microsoft Research (2006-2009) before joining CMU. He has held product, research and consulting/visiting positions at various companies, including Intel Corporation, Advanced Micro Devices, VMware, Google. He has performed significant consulting work for various companies and institutions. Many techniques Onur, with his group and collaborators, has invented over the years have largely influenced industry and have been widely employed in commercial microprocessors and memory & storage systems (including both DRAM and NAND flash memories and controllers, as well as microprocessor and accelerator memory hierarchies) used daily by hundreds of millions of people.
Pre-Con Seminar Description:
This short course (consisting of two sessions, which can be taken independently of each other) covers some major ideas and techniques in modern computing platforms and applications, with a special focus on the design of the memory (and storage) system (using a cross-layer approach that spans systems, applications, software, and hardware). The first session of the two-part course enables the attendees to develop a rigorous approach in memory systems and memory-centric computing systems and gets them ready to develop methods to solve data movement and memory bottleneck problems. The major focus is on memory-centric computing, including processing near memory (PnM), processing in memory (PiM), and processing using memory (PuM) systems and techniques, to enable fundamentally higher performance, energy-efficient, and scalable systems. Both main memory and storage will be examined, with the goal of greatly improving the performance and efficiency of major workloads, such as ML/AI workloads, large language models, (LLMs), graph analytics, databases, video analytics, data analytics.
PRO PDSB1: DRAM for AI – HBM and What’s Next
Track: Professional Development Series (Pre-Conference)
Speakers:
Marc Greenberg, Principal/CEO, Marc Greenberg Consulting
Marc Greenberg is an independent consultant in memory, semiconductor and IP. Marc currently serves as VP of Product for Cassia.ai, an AI IP company, as vice-chair of an undisclosed task group at JEDEC, and as advisor to several other companies. Marc was responsible for product management of HBM and other memory and storage IP products at Denali, Cadence and Synopsys for 20 years out of a 30-year career in semiconductor and IP. Marc has a master's degree in Electronics from the University of Edinburgh in Scotland.
Pre-Con Seminar Description:
High Bandwidth Memory (HBM) is a critical technology in many processors running the latest LLMs and Generative Artificial Intelligence applications. In this PDS tutorial we will cover the case for both existing and novel techniques for interfacing DRAM such as HBM to the predominantly non-Von-Neumann compute architectures found in GPU/NPU/TPU (collectively, xPU). Attendees will learn the key aspects of HBM, including its history, architecture, and market trends, as well as a brief comparison to other popular DRAM memory types such as DDR, LPDDR, and GDDR. We'll use this foundation to discuss AI processor architectures and how they use DRAM including the impact of arithmetic intensity, quantization and sparsity on DRAM access. Finally we'll cover architectural techniques for improving memory access in AI applications ("breaking the memory wall"), including improving bandwidth, moving compute closer to RAM and moving RAM closer to compute. Takeaways from this session: - Understand the internal structure of HBM and how it evolved to be the memory of choice for datacenter AI applications - Understand the complex relationships between memory and compute in AI - Learn how the architecture of xPUs drive DRAM selection - Understand the techniques used to optimize the interaction between compute and memory in AI - See how evolving memory architectures will shape AI processors in the future
PRO PDSC1: Introduction to DRAM
Track: Professional Development Series (Pre-Conference)
Speakers:
Bill Gervasi, Principal Memory Solutions Architect, Monolithic Power Systems
Mr. Gervasi has nearly 5 decades of experience in high speed memory subsystem definition, design, and product development. He piloted the definition of Double Data Rate SDRAM since its earliest inception, authoring the first standard specification, and created the Automotive SSD standard. With MPS, Bill is driving some of the memory and storage system management mechanisms for a post-quantum world. He received the JEDEC Technical Excellence award, their highest honor, in 2020.
Pre-Con Seminar Description:
Description Not Available
PRO PDSD1: AiSAQ™ OSS:  Scaling RAG Beyond DRAM Limits with SSD
Track: Professional Development Series (Pre-Conference)
Speakers:
Rory Bolt, Sr. Fellow, KIOXIA America, Inc
Rory joined KIOXIA America in 2017. He has founded, built teams, and delivered product at four storage startups which were acquired ($400M, $165M and two undisclosed). Rory has more than 25 years of experience in data storage systems, data protection systems, and high-performance computing with tenures as VP Software Engineering at Samsung, Technical Director/CTO Counsel at NetApp, CTO Counsel at EMC, VP, Chief Storage Architect, and Distinguished Fellow at Quantum. Rory has been granted over 12 storage related patents and has several pending. Rory has a BS in Computer Engineering from UCSD.
Pre-Con Seminar Description:
Open source software (OSS) project AiSAQ provides a new approach to scaling AI, especially for Retrieval-Augmented Generation (RAG). RAG can improve accuracy for AI models that use Approximate Nearest Neighbor Search (ANNS)) of vector databases. DiskANN was developed to store some elements of the vector database on SSDs, which enables a larger database. AiSAQ™ open-source technology allows all of the vector database elements to be stored in SSDs. This enables a number of capabilities, including a limitless vector database size, faster time-to-ready, and can connect to multiple AI host systems simultaneously. The end result is a better RAG implementation for the AI system utilizing AiSAQ technology. Learn about graph-based ANNS vs. cluster-based ANNS, how to deploy and utilize AiSAQ, the advantages for RAG/vector administrators and service providers, and an introduction to the SSD-based ANN algorithm.
03:15 PM to 05:00 PM
PRO PDSA2: Memory-Centric Computing (Part 2): Advanced Applications
Track: Professional Development Series (Pre-Conference)
Speakers:
Onur Mutlu, Professor of Computer Science, ETH Zurich
Onur Mutlu is a Professor of Computer Science at ETH Zurich. He previously held the William D. and Nancy W. Strecker Early Career Professorship at Carnegie Mellon University and was a Visiting Professor at Stanford University. He started the Computer Architecture Group at Microsoft Research (2006-2009) before joining CMU. He has held product, research and consulting/visiting positions at various companies, including Intel Corporation, Advanced Micro Devices, VMware, Google. He has performed significant consulting work for various companies and institutions. Many techniques Onur, with his group and collaborators, has invented over the years have largely influenced industry and have been widely employed in commercial microprocessors and memory & storage systems (including both DRAM and NAND flash memories and controllers, as well as microprocessor and accelerator memory hierarchies) used daily by hundreds of millions of people.
Pre-Con Seminar Description:
This short course (consisting of two sessions, which can be taken independently of each other) covers some major ideas and techniques in modern computing platforms and applications, with a special focus on the design of the memory (and storage) system (using a cross-layer approach that spans systems, applications, SW and HW. The second session of this two-part course studies memory-centric computing in more depth with a focus on two other major topics, examining on both problems and effective solution techniques, covering both software and hardware levels, using a system-level cross-layer approach: 1) acceleration techniques for major data-intensive workloads, including especially ML/AI workloads, large language models, (LLMs), graph analytics, databases, video analytics, data analytics, genome analysis, mobile workloads, via memory-centric methods (across the computing stack) 2) machine learning and artificial intelligence assisted system design for better decision making, including ML/AI-driven (e.g., reinforcement learning based, DNN-based, perceptron-based) intelligent memory systems (e.g., prefetchers, storage management systems, memory controllers, data location predictors)
PRO PDSB2: Emerging Technologies for Future Memory Subsystems
Track: Professional Development Series (Pre-Conference)
Speakers:
Shimeng Yu, Professor, Georgia Institute of Technology
Shimeng Yu is the endowed Dean’s Professor of Electrical and Computer Engineering at the Georgia Institute of Technology. He received a PhD degree from Stanford University in 2013. He is elevated for the IEEE Fellow for contributions to non-volatile memories and in-memory computing. His 400+ publications received 33,000+ citations (Google Scholar) with H-index 83. He serves flagship conferences in the semiconductor field as technical program committee, such as IEEE International Electron Devices Meeting (IEDM), IEEE Symposium on VLSI Technology and Circuits, etc. Among Prof. Yu’s honors, he was a recipient of National Science Foundation (NSF) CAREER Award in 2016, IEEE Electron Devices Society (EDS) Early Career Award in 2017, ACM Special Interests Group on Design Automation (SIGDA) Outstanding New Faculty Award in 2018, Semiconductor Research Corporation (SRC) Inaugural Young Faculty Award in 2019, IEEE Circuits and Systems Society (CASS) Distinguished Lecturer in 2021, IEEE Electron Devices Society (EDS) Distinguished Lecturer in 2022, and Intel Outstanding Researcher Award in 2023, etc. He is the author of the textbook of Semiconductor Memory Devices and Circuits.
Pre-Con Seminar Description:
We present emerging memory device technologies to fulfill the ever-increasing demands for data-intensive AI applications. We will first survey the recent industry’s research and development progresses in chip macros and prototypes including resistive random access memory (RRAM), phase change memory (PCM), magnetic random access memory (MRAM) and ferroelectric memories (FeRAM or FeFET). While many of the emerging memories are positioned to serve embedded non-volatile memories (NVMs) for automotive microcontrollers, we envision several disruptive technology breakthroughs that may revolutionize the mainstream memory hierarchy in the CPU/GPU architectures from the on-chip caches towards main memories and storage. Here are the bets: 1) for last-level-cache towards GB, we present innovations in back-end-of-line (BEOL) compatible oxide semiconductor based gain cell memories, that is monolithically stackable on top of CMOS thus overcomes the SRAM scaling limits for sub-2nm nodes; 2) for sub-10nm DRAM generations and future high-bandwidth-memory (HBM), we present a bit-cost-scalable 3D DRAM architecture of horizontal 1T1C structures; 3) for 3D NAND that scales towards 1000+ layers.
PRO PDSC2: DRAM Solutions for a Fragmented Application Space
Track: Professional Development Series (Pre-Conference)
Speakers:
Bill Gervasi, Principal Memory Solutions Architect, Monolithic Power Systems
Mr. Gervasi has nearly 5 decades of experience in high speed memory subsystem definition, design, and product development. He piloted the definition of Double Data Rate SDRAM since its earliest inception, authoring the first standard specification, and created the Automotive SSD standard. With MPS, Bill is driving some of the memory and storage system management mechanisms for a post-quantum world. He received the JEDEC Technical Excellence award, their highest honor, in 2020.
Pre-Con Seminar Description:
This session explores the transformative evolution in computer architectures driven by the integration of advanced memory technologies and the influence of artificial intelligence. We delve into the emergence of chiplets, fabrics, and novel switching tiers, alongside hybrid memory and storage solutions that necessitate a comprehensive reevaluation of data flow strategies. Key discussions include the introduction of new memory tiers such as HBM and CXL, and the role of fabrics like NVLink and UALink in shaping memory tiering. As AI demands escalate, blending diverse memory approaches becomes crucial, presenting both opportunities and challenges in terms of energy consumption and efficiency. We will also examine the implications for automotive architectures, mobile clients, edge applications, and the pressing need for sustainable energy solutions in data centers. Attendees will gain insights into these cutting-edge trends and their impact on total cost of ownership, equipping them to make informed technology tradeoffs.
PRO PDSD2: Rearchitecting Storage for GenAI
Track: Professional Development Series (Pre-Conference)
Speakers:
Chris Newburn, Distinguished Engineer, NVIDIA
Dr. Chris J. Newburn, who goes by CJ, is a Distinguished Engineer who drives industry-wide initiatives like Storage-Next, HPC strategy and the SW IO product roadmap in NVIDIA Compute Software, with a special focus on data center architecture and security, storage and network IO, systems, and programming models for scale. He is a community builder with a passion for extending the core capabilities of hardware and software platforms from HPC into AI, data science, and visualization. He's delighted to have worked on volume products that his Mom used and that help researchers do their life's work in science that previously wasn't possible.
Pre-Con Seminar Description:
Applications are changing faster in the GenAI space than we've ever seen. It's very difficult to keep up with shifting requirements that these applications impose on data transmission within computing elements and in storage systems. In this tutorial, the we'll share data-driven insights into storage criticality, access patterns, bandwidth, latency, and granularity. We cover a range of applications, including LLM training and inference, retrieval augmented generation, vector search and vector databases, and graph neural networks including those integrated into LLMs. From these data-driven insights, the audience will see the motivation for the Storage-Next effort that drives toward creating a new storage SKU and storage reference architecture that's focused on IOPs/TCO rather than just TB/$. Expect to leave with greater clarity on what kind of storage is needed where in the data center (global, cluster-local, compute-local) and how this relates to each kind of application.