2025 Program at a Glance - FMS: the Future of Memory and Storage

08:00 AM to 05:00 PM

Open REG: Registration

Main Lobby (Santa Clara Convention Center, First Floor)

Track: General Events

General Event Description:

Registration for August 4, 2025 Professional Development Series and August 5-7, 2025 FMS Conference.

01:00 PM to 02:45 PM

PRO PDSA1: Memory-Centric Computing (Part 1): Fundamental Techniques

Room 209 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Onur Mutlu, Professor of Computer Science, ETH Zurich

Onur Mutlu is a Professor of Computer Science at ETH Zurich. He previously held the William D. and Nancy W. Strecker Early Career Professorship at Carnegie Mellon University and was a Visiting Professor at Stanford University. He started the Computer Architecture Group at Microsoft Research (2006-2009) before joining CMU. He has held product, research and consulting/visiting positions at various companies, including Intel Corporation, Advanced Micro Devices, VMware, Google. He has performed significant consulting work for various companies and institutions. Many techniques Onur, with his group and collaborators, has invented over the years have largely influenced industry and have been widely employed in commercial microprocessors and memory & storage systems (including both DRAM and NAND flash memories and controllers, as well as microprocessor and accelerator memory hierarchies) used daily by hundreds of millions of people.

Pre-Con Seminar Description:

This short course (consisting of two sessions, which can be taken independently of each other) covers some major ideas and techniques in modern computing platforms and applications, with a special focus on the design of the memory (and storage) system (using a cross-layer approach that spans systems, applications, software, and hardware). The first session of the two-part course enables the attendees to develop a rigorous approach in memory systems and memory-centric computing systems and gets them ready to develop methods to solve data movement and memory bottleneck problems. The major focus is on memory-centric computing, including processing near memory (PnM), processing in memory (PiM), and processing using memory (PuM) systems and techniques, to enable fundamentally higher performance, energy-efficient, and scalable systems. Both main memory and storage will be examined, with the goal of greatly improving the performance and efficiency of major workloads, such as ML/AI workloads, large language models, (LLMs), graph analytics, databases, video analytics, data analytics.

PRO PDSB1: DRAM for AI – HBM and What’s Next

Room 203 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Marc Greenberg, Principal/CEO, Marc Greenberg Consulting

Marc Greenberg is an independent consultant in memory, semiconductor and IP. Marc currently serves as VP of Product for Cassia.ai, an AI IP company, as vice-chair of an undisclosed task group at JEDEC, and as advisor to several other companies. Marc was responsible for product management of HBM and other memory and storage IP products at Denali, Cadence and Synopsys for 20 years out of a 30-year career in semiconductor and IP. Marc has a master's degree in Electronics from the University of Edinburgh in Scotland.

Pre-Con Seminar Description:

High Bandwidth Memory (HBM) is a critical technology in many processors running the latest LLMs and Generative Artificial Intelligence applications. In this PDS tutorial we will cover the case for both existing and novel techniques for interfacing DRAM such as HBM to the predominantly non-Von-Neumann compute architectures found in GPU/NPU/TPU (collectively, xPU). Attendees will learn the key aspects of HBM, including its history, architecture, and market trends, as well as a brief comparison to other popular DRAM memory types such as DDR, LPDDR, and GDDR. We'll use this foundation to discuss AI processor architectures and how they use DRAM including the impact of arithmetic intensity, quantization and sparsity on DRAM access. Finally we'll cover architectural techniques for improving memory access in AI applications ("breaking the memory wall"), including improving bandwidth, moving compute closer to RAM and moving RAM closer to compute. Takeaways from this session: - Understand the internal structure of HBM and how it evolved to be the memory of choice for datacenter AI applications - Understand the complex relationships between memory and compute in AI - Learn how the architecture of xPUs drive DRAM selection - Understand the techniques used to optimize the interaction between compute and memory in AI - See how evolving memory architectures will shape AI processors in the future

PRO PDSC1: DRAM Part 1: Fundamentals, From Cells to Modules

Room 204 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Bill Gervasi, Principal Memory Solutions Architect, Monolithic Power Systems

Mr. Gervasi has nearly 5 decades of experience in high speed memory subsystem definition, design, and product development. He piloted the definition of Double Data Rate SDRAM since its earliest inception, authoring the first standard specification, and created the Automotive SSD standard. With MPS, Bill is driving some of the memory and storage system management mechanisms for a post-quantum world. He received the JEDEC Technical Excellence award, their highest honor, in 2020.

Pre-Con Seminar Description:

Understanding computer memory architectures and tiers starts with analyzing the internal workings of the memory devices to understand how they store information, the challenges with maintaining that information, and how to get that data in and out of the attached system. As data rates have increased, new techniques have been developed to ensure data reliability and keep power under control. The majority of memory is assembled onto carriers called DIMMs which come in a variety of configurations based on the application. This training will compare and contrast the families of DIMMs. Takeaways from this session: * Understand the internal structure of DRAM, how DDR5 evolved from past implementations – and how things stalled to create the Memory Wall * See how module requirements drive DRAM architecture, and where the RAS-CAS protocol came from * Understand the many configurations of memory modules * Understand the module support subsystems and how they work together * See how system software optimizes performance

PRO PDSD1: KIOXIA AiSAQ™ OSS: Scaling RAG Beyond DRAM Limits with SSD

Room 210 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Rory Bolt, Senior Fellow, KIOXIA America

Rory Bolt is a senior fellow at KIOXIA America and leads the forward-looking technology and storage pathfinding group for SSDs. He has more than twenty-five years of experience in data storage systems, data protection systems, and high-performance computing with a pedigree from marquee storage companies. Rory has been granted over 12 storage related patents and has several pending. Rory has a BS in Computer Engineering from UCSD.

Assaf Sella, VP of Machine Learning R&D, KIOXIA Israel, Ltd

Assaf serves as Vice President of Machine Learning R&D at KIOXIA Israel development center, where he leads research in generative AI, and deep neural networks to improve Flash physical-layer reliability. Prior to KIOXIA, Assaf was CTO of Texas Instruments Israel, and held leadership roles in other Israeli high-tech corporations and startups. Assaf holds an Executive MBA from Kellogg School of Management at Northwestern University, and M.Sc and B.Sc in Electrical Engineering from Tel-Aviv University and the Technion, both summa cum laude.

Pre-Con Seminar Description:

Open source software (OSS) project AiSAQ provides a new approach to scaling AI, especially for Retrieval-Augmented Generation (RAG). RAG can improve accuracy for AI models that use Approximate Nearest Neighbor Search (ANNS)) of vector databases. DiskANN was developed to store some elements of the vector database on SSDs, which enables a larger database. KIOXIA AiSAQ™ open-source technology allows all of the vector database elements to be stored in SSDs. This enables a number of capabilities, including a limitless vector database size, faster time-to-ready, and can connect to multiple AI host systems simultaneously. The end result is a better RAG implementation for the AI system utilizing KIOXIA AiSAQ technology. Learn about graph-based ANNS vs. cluster-based ANNS, how to deploy and utilize KIOXIA AiSAQ, the advantages for RAG/vector administrators and service providers, and an introduction to the SSD-based ANN algorithm.

02:45 PM to 03:15 PM

PRO BRK: Monday PM Refreshment Break

200 Atrium (Meeting Rooms SCCC Floor 2)

Track: General Events

General Event Description:

Description Not Available

03:15 PM to 05:00 PM

PRO PDSA2: Memory-Centric Computing (Part 2): Advanced Applications

Room 209 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Onur Mutlu, Professor of Computer Science, ETH Zurich

Onur Mutlu is a Professor of Computer Science at ETH Zurich. He previously held the William D. and Nancy W. Strecker Early Career Professorship at Carnegie Mellon University and was a Visiting Professor at Stanford University. He started the Computer Architecture Group at Microsoft Research (2006-2009) before joining CMU. He has held product, research and consulting/visiting positions at various companies, including Intel Corporation, Advanced Micro Devices, VMware, Google. He has performed significant consulting work for various companies and institutions. Many techniques Onur, with his group and collaborators, has invented over the years have largely influenced industry and have been widely employed in commercial microprocessors and memory & storage systems (including both DRAM and NAND flash memories and controllers, as well as microprocessor and accelerator memory hierarchies) used daily by hundreds of millions of people.

Pre-Con Seminar Description:

This short course (consisting of two sessions, which can be taken independently of each other) covers some major ideas and techniques in modern computing platforms and applications, with a special focus on the design of the memory (and storage) system (using a cross-layer approach that spans systems, applications, SW and HW. The second session of this two-part course studies memory-centric computing in more depth with a focus on two other major topics, examining on both problems and effective solution techniques, covering both software and hardware levels, using a system-level cross-layer approach: 1) acceleration techniques for major data-intensive workloads, including especially ML/AI workloads, large language models, (LLMs), graph analytics, databases, video analytics, data analytics, genome analysis, mobile workloads, via memory-centric methods (across the computing stack) 2) machine learning and artificial intelligence assisted system design for better decision making, including ML/AI-driven (e.g., reinforcement learning based, DNN-based, perceptron-based) intelligent memory systems (e.g., prefetchers, storage management systems, memory controllers, data location predictors)

PRO PDSB2: Emerging Technologies for Future Memory Subsystems

Room 203 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Shimeng Yu, Professor, Georgia Institute of Technology

Shimeng Yu is the endowed Dean’s Professor of Electrical and Computer Engineering at the Georgia Institute of Technology. He received a PhD degree from Stanford University in 2013. He is elevated for the IEEE Fellow for contributions to non-volatile memories and in-memory computing. His 400+ publications received 33,000+ citations (Google Scholar) with H-index 83. He serves flagship conferences in the semiconductor field as technical program committee, such as IEEE International Electron Devices Meeting (IEDM), IEEE Symposium on VLSI Technology and Circuits, etc. Among Prof. Yu’s honors, he was a recipient of National Science Foundation (NSF) CAREER Award in 2016, IEEE Electron Devices Society (EDS) Early Career Award in 2017, ACM Special Interests Group on Design Automation (SIGDA) Outstanding New Faculty Award in 2018, Semiconductor Research Corporation (SRC) Inaugural Young Faculty Award in 2019, IEEE Circuits and Systems Society (CASS) Distinguished Lecturer in 2021, IEEE Electron Devices Society (EDS) Distinguished Lecturer in 2022, and Intel Outstanding Researcher Award in 2023, etc. He is the author of the textbook of Semiconductor Memory Devices and Circuits.

Pre-Con Seminar Description:

We present emerging memory device technologies to fulfill the ever-increasing demands for data-intensive AI applications. We will first survey the recent industry’s research and development progresses in chip macros and prototypes including resistive random access memory (RRAM), phase change memory (PCM), magnetic random access memory (MRAM) and ferroelectric memories (FeRAM or FeFET). While many of the emerging memories are positioned to serve embedded non-volatile memories (NVMs) for automotive microcontrollers, we envision several disruptive technology breakthroughs that may revolutionize the mainstream memory hierarchy in the CPU/GPU architectures from the on-chip caches towards main memories and storage. Here are the bets: 1) for last-level-cache towards GB, we present innovations in back-end-of-line (BEOL) compatible oxide semiconductor based gain cell memories, that is monolithically stackable on top of CMOS thus overcomes the SRAM scaling limits for sub-2nm nodes; 2) for sub-10nm DRAM generations and future high-bandwidth-memory (HBM), we present a bit-cost-scalable 3D DRAM architecture of horizontal 1T1C structures; 3) for 3D NAND that scales towards 1000+ layers.

PRO PDSC2: DRAM Part 2: Systems Implications, Low Power, and HBM Alternatives

Room 204 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Bill Gervasi, Principal Memory Solutions Architect, Monolithic Power Systems

Mr. Gervasi has nearly 5 decades of experience in high speed memory subsystem definition, design, and product development. He piloted the definition of Double Data Rate SDRAM since its earliest inception, authoring the first standard specification, and created the Automotive SSD standard. With MPS, Bill is driving some of the memory and storage system management mechanisms for a post-quantum world. He received the JEDEC Technical Excellence award, their highest honor, in 2020.

Pre-Con Seminar Description:

Long gone are the days where DRAM and storage were the only two options for computer architectures. Chiplets, fabrics, and switching tiers have all entered the equation, and hybrids of memory and storage contribute to a top to bottom rethinking of data flow. Artificial intelligence as a dominant emerging technology is also affecting the equation where memory capacity requirements force a blending of multiple memory approaches to feed the beast. These new requirements are also driving an energy crisis. The combination of AI and cryptocurrency already consumes 1.5% of the energy resources of planet Earth, with an unsustainable future if design for efficiency is ignored. To address efficiency without incurring unsustainable cost requires effective total cost of ownership analyses. Takeaways from this session: * Understand systems integration of the variety of memory solutions * Understand the new memory tiers including CXL and the impact of UALINK/NVLINK * Understand how AI is driving new demands for system memory, including both DDR and LPDDR memories * See how AI and cryptocurrency are creating an energy crisis and understand the factors to address wasted power

PRO PDSD2: Rearchitecting Storage for AI ("Storage-Next")

Room 210 (Meeting Rooms SCCC Floor 2)

Track: Professional Development Series (Pre-Conference)

Speakers:

Chris Newburn, Distinguished Engineer, NVIDIA

Dr. Chris J. Newburn, who goes by CJ, is a Distinguished Engineer who drives industry-wide initiatives like Storage-Next, HPC strategy and the SW IO product roadmap in NVIDIA Compute Software, with a special focus on data center architecture and security, storage and network IO, systems, and programming models for scale. He is a community builder with a passion for extending the core capabilities of hardware and software platforms from HPC into AI, data science, and visualization. He's delighted to have worked on volume products that his Mom used and that help researchers do their life's work in science that previously wasn't possible.

Vikram Sharma Mailthody, Sr Research Scientist, NVIDIA

Vikram Sharma Mailthody is a Sr Research Scientist in NVIDIA Research studying emerging datacenter workloads and building future NVIDIA systems. He works on distributed inference at scale, vector databases and agentic workflows. Dr. Mailthody co-leads Storage-next industry wide initiative with CJ Newburn and is co-architect of NVIDIA Dynamo.

Pre-Con Seminar Description:

Applications are changing faster in the GenAI space than we've ever seen. It's very difficult to keep up with shifting requirements that these applications impose on data transmission within computing elements and in storage systems. In this tutorial, the we'll share data-driven insights into storage criticality, access patterns, bandwidth, latency, and granularity. We cover a range of applications, including LLM training and inference, retrieval augmented generation, vector search and vector databases, and graph neural networks including those integrated into LLMs. From these data-driven insights, the audience will see the motivation for the Storage-Next effort that drives toward creating a new storage SKU and storage reference architecture that's focused on IOPs/TCO rather than just TB/$. Expect to leave with greater clarity on what kind of storage is needed where in the data center (global, cluster-local, compute-local) and how this relates to each kind of application.