This course explores the architectural foundations and design principles that enable high-performance and scalable parallel computing systems. Building on the fundamentals of computer architecture, students will study techniques for exploiting instruction-, data-, and thread-level parallelism in modern multi-core and heterogeneous processors. Topics include superscalar and vector architectures, memory consistency and cache coherence, GPU architecture and programming, and advanced memory systems such as DRAM, non-volatile memory, and Processing-in-Memory. The course also introduces network-on-chip interconnects, dataflow and systolic architectures for machine learning acceleration, and methods for workload mapping and optimization. Emphasis is placed on performance modeling, design trade-offs, and architectural innovations that drive the evolution of parallel and accelerated computing.
Upon successful completion of this course, students will be able to:
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2017 (6th edition)