CS61Cuda - CS61C
Class Type: Intro Computer ArchitectureWelcome to CS61Cuda!! This is a mini‑project that introduces students to GPU programming with CUDA by building up to a fast matrix multiply. They will start with a CPU reference, write CUDA kernels, learn to reason about grids/blocks/threads, and finally add simple vectorization (SIMD) on the GPU. An optional performance sandbox lets students explore optimizations for bragging rights.
Authors:Presentations
- Slides: https://docs.google.com/presentation/d/1i2_tcdh2HcAU4fL4XUy5LmnYnvEv76FIPNrj7iZDMp4/edit?usp=sharing
Project Source
- Github: https://github.com/malavikhasudarshan/cs61cuda
- Website: https://malavikhasudarshan.github.io/cs61cuda/
Table of Contents
Introduction
CS61C is UC Berkeley’s introductory student class to the hardware-software interface, and topics that span this boundary. It covers a survey of topics from C down to the datapath; a ‘survey’ down the stack unlocking further low-level UD classes.
CS61Cuda is the “The New Project 4/Lab 8”; it fills a crucial missing gap - a practical assignment involving combining parallelism concepts to optimize programs (and showcase creativity!) It is inspired by CS61C Spring 2014 (thank you Prof. Sagar K for providing your materials from that semester!)
Previously, CS61C had a parallelism assignment - 61kaChow - with various issues. Both students and staff felt it did not actually help students with understanding of parallelism and was very hard to debug. Regardless, there was a need to give students practice…
The motivation for this new project that we developed was to move abstract lecture content to a tangible project. (and get NVIDIA funding :p), give students hands-on exposure to modern parallel programming models, reinforce course concepts and provide an introduction to kernel/GPU programming.
Logistically, we estimate that this project will slot in during approximately Week 12 of the semester, shortly after the five parallelism lectures. It is estimated to take 4-6 work hours and will probably be assigned with a due date of ~2 weeks later.
Learning Goals
-
Reason down the stack: understand the relationships between layers of abstraction
-
Become hardware-aware: develop understanding of architecture and its impacts
-
Optimize: use full-stack awareness to design and optimize programs on modern GPU architecture
-
Reinforce parallelism: synthesize the different types of parallelism learned across previous labs/lectures
-
Matrix proficiency: build understanding of indexing, bound checks, memory access patterns, and kernel design
-
Memory vs compute: gain intuition of where programs tend to become memory vs compute bound
Student Assignment
The Github contains the source files, which students can clone from.
Instructions and the CUDA primer can be viewed on the assignment website or on the Github README.md.
Instructor Guides
Instructors can contact us via email for the Github repository containing staff solutions and a Gradescope autograder script.
Currently, we can provide the following to instructors:
- Website: stored in a repository for easy updating + access
- Resources: code from other courses on parallel programming
- Solutions: sample solutions for the guided portions of the class
- Autograder: includes Gradescope integration (JSON file results) for convenient porting
- Grading: quantitative breakdown of grading for the optimized section
Requirements
Technical requirements
- CUDA‑capable GPUs on EECS Hive machines
- CUDA toolkit (e.g., cuda/12.x module with nvcc)
- C++17 compiler (via nvcc, g++)
- Make (make and provided Makefile)
- Git (clone starter repo, submit via Git/Gradescope)
Classroom/logistical requirements
- Access to Hive GPU lab machines or remote SSH into Hive
- In‑person lab/OH time for debugging CUDA
- Enough TAs/tutors/academic interns to support debugging and manually grade free response written questions