Midterm Review: CUDA & GPU Acceleration

The document provides information about an upcoming midterm exam and guest lecture for an administrative class. It includes: 1) A hard deadline of Friday at 5PM for project proposals and no class on Friday. 2) A guest lecture on March 23rd by Austin Robison from NVIDIA on interoperability between CUDA and GPU rendering. 3) A midterm exam will be held in class on March 25th covering content from lectures, assignments, and the syllabus. It will include definitions, constraints on GPU resources, problem solving questions, and a brief essay.

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views11 pages

Midterm Review: CUDA & GPU Acceleration

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

L13: Review for Midterm

Administrative
• Project proposals due Friday at 5PM
(hard deadline)
• No makeup class Friday!
• March 23, Guest Lecture
• Austin Robison, NVIDIA
• Topic: Interoperability between CUDA and
Rendering on GPUs
• March 25, MIDTERM in class
Outline
• Questions on proposals?
– Discussion of MPM/GIMP issues
• Review for Midterm
– Describe planned exam
– Go over syllabus
– Review L4: execution model
Reminder: Content of Proposal,
MPM/GIMP as Example
I. Team members: Name and a sentence on expertise for each member
Obvious
II. Problem description
- What is the computation and why is it important?
- Abstraction of computation: equations, graphic or pseudo-code, no more
than 1 page
Straightforward adaptation from MPM presentation and/or code
III. Suitability for GPU acceleration
- Amdahl’s Law: describe the inherent parallelism. Argue that it is close
to 100% of computation. Use measurements from CPU execution of
computation if possible
Can measure sequential code
Remove “history” function
Phil will provide us with a scaled up computation that fits in 512MB

4
CS6963
L10: Floating Point
Reminder: Content of Proposal,
MPM/GIMP as Example
III. Suitability for GPU acceleration, cont.
- Synchronization and Communication: Discuss what data structures may
need to be protected by synchronization, or communication through
host.
Some challenges on boundaries between nodes in grid
- Copy Overhead: Discuss the data footprint and anticipated cost of
copying to/from host memory.
Measure grid and patches to discover data footprint. Consider ways to
combine computations to reduce copying overhead.
IV. Intellectual Challenges
- Generally, what makes this computation worthy of a project?
Importance of computation, and challenges in partitioning computation,
dealing with scope, managing copying overhead
- Point to any difficulties you anticipate at present in achieving high
speedup
See previous

5
CS6963
L10: Floating Point
Midterm Exam
• Goal is to reinforce understanding of
CUDA and NVIDIA architecture
• Material will come from lecture notes
and assignments
• In class, should not be difficult to
finish
Parts of Exam
I. Definitions
– A list of 10 terms you will be asked to define
II. Constraints
- Understand constraints on numbers of threads, blocks, warps, size of
storage
III. Problem Solving
- Derive distance vectors for sequential code and use these to
transform code to CUDA, making use of constant memory
- Given some CUDA code, indicate whether global memory accesses will
be coalesced and whether there will be bank conflicts in shared
memory
- Given some CUDA code, add synchronization to derive a correct
implementation
- Given some CUDA code, provide an optimized version that will have
fewer divergent branches
- Given some CUDA code, derive a partitioning into threads and blocks
that does not exceed various hardware limits
IV. (Brief) Essay Question
- Pick one from a set of 4
How Much? How Many?
• How many threads per block? Max 512
• How many blocks per grid? Max 65535
• How many threads per warp? 32
• How many warps per multiprocessor? 24
• How much shared memory per streaming
multiprocessor? 16Kbytes
• How many registers per streaming
multiprocessor? 8192
• Size of constant cache: 8Kbytes
Syllabus
L1 & L2: Introduction and CUDA Overview
* Not much there…
L3: Synchronization and Data Partitioning
• What does __syncthreads () do?
• Indexing to map portions of a data structure to a particular thread
L4: Hardware and Execution Model
• How are threads in a block scheduled? How are blocks mapped to
streaming multiprocessors?
L5: Dependence Analysis and Parallelization
• Constructing distance vectors
• Determining if parallelization is safe
L6: Memory Hierarchy I: Data Placement
• What are the different memory spaces on the device, who can
read/write them?
• How do you tell the compiler that something belongs in a particular
memory space?
Syllabus
L7: Memory Hierarchy II: Reuse and Tiling
• Safety and profitability of tiling
L8: Memory Hierarchy III: Memory Bandwidth
• Understanding global memory coalescing (for compute capability
< 1.2 and > 1.2)
• Understanding memory bank conflicts
L9: Control Flow
• Divergent branches
• Execution model
L10: Floating Point
• Intrinsics vs. arithmetic operations, what is more precise?
• What operations can be performed in 4 cycles, and what
operations take longer?
L11: Tools: Occupancy Calculator and Profiler
• How do they help you?
Next Time
• March 23:
– Guest Lecture, Austin Robison
• March 25:
– MIDTERM, in class

Federated Learning: Data Privacy Insights
No ratings yet
Federated Learning: Data Privacy Insights
11 pages
RC5-stream Cipher Algorithm
No ratings yet
RC5-stream Cipher Algorithm
30 pages
Overview of Blowfish Encryption
No ratings yet
Overview of Blowfish Encryption
11 pages
RC4 Stream Cipher Overview and Security
No ratings yet
RC4 Stream Cipher Overview and Security
30 pages
MOON: Model-Contrastive Federated Learning
No ratings yet
MOON: Model-Contrastive Federated Learning
10 pages
Quantum Federated Learning in Healthcare
No ratings yet
Quantum Federated Learning in Healthcare
16 pages
Secure Federated Learning Review
No ratings yet
Secure Federated Learning Review
18 pages
Efficient Privacy-Preserving FL System
No ratings yet
Efficient Privacy-Preserving FL System
20 pages
Efficient Private Aggregation in FL
No ratings yet
Efficient Private Aggregation in FL
18 pages
Blockchain in Federated Learning Review
No ratings yet
Blockchain in Federated Learning Review
35 pages
Decentralized Privacy-Preserving IDS for IIoT
No ratings yet
Decentralized Privacy-Preserving IDS for IIoT
18 pages
OpenMP: Parallel Programming Guide
No ratings yet
OpenMP: Parallel Programming Guide
25 pages
Overview of Quantum Federated Learning
No ratings yet
Overview of Quantum Federated Learning
5 pages
Entropy 23 00460 v2
No ratings yet
Entropy 23 00460 v2
14 pages
Q-Circuit: A LaTeX Tutorial
No ratings yet
Q-Circuit: A LaTeX Tutorial
7 pages
Digital Telephony Course Answer Key
No ratings yet
Digital Telephony Course Answer Key
5 pages
Testbank Lets Get Lost Fast Download
100% (1)
Testbank Lets Get Lost Fast Download
271 pages
Smart Hospital Management System
No ratings yet
Smart Hospital Management System
10 pages
Schematics Inter Bop Controls Detailed Outline
50% (2)
Schematics Inter Bop Controls Detailed Outline
6 pages
Micromachines 14 00113 v2
No ratings yet
Micromachines 14 00113 v2
17 pages
AWS Overview: Features, History, and Services
No ratings yet
AWS Overview: Features, History, and Services
150 pages
Understanding Usenet and NNTP Protocols
No ratings yet
Understanding Usenet and NNTP Protocols
11 pages
Digital Footprint Review for Class 6
No ratings yet
Digital Footprint Review for Class 6
2 pages
Automation Testing Resume of Susmita Sahoo
No ratings yet
Automation Testing Resume of Susmita Sahoo
3 pages
LA78040 Vertical Deflection IC Overview
No ratings yet
LA78040 Vertical Deflection IC Overview
4 pages
Field Engineering Training in Telecom
No ratings yet
Field Engineering Training in Telecom
3 pages
Thesis 2
No ratings yet
Thesis 2
15 pages
J.2018. A Neural Network-Based Algorithm With Genetic Training For A Combined Job and Energy Management For AGVs
No ratings yet
J.2018. A Neural Network-Based Algorithm With Genetic Training For A Combined Job and Energy Management For AGVs
9 pages
Baru 3
No ratings yet
Baru 3
10 pages
B.Tech CSE 2nd Semester Exam Scheme
No ratings yet
B.Tech CSE 2nd Semester Exam Scheme
347 pages
Lanmec Power-Assisted Bicycle Test Bench
No ratings yet
Lanmec Power-Assisted Bicycle Test Bench
24 pages
iMAL Implementation Schedule Overview
No ratings yet
iMAL Implementation Schedule Overview
2 pages
Hodges EDU519 Syllabus
No ratings yet
Hodges EDU519 Syllabus
12 pages
Network Security Engineer Profile
No ratings yet
Network Security Engineer Profile
9 pages
Automatic Return Turntable
No ratings yet
Automatic Return Turntable
10 pages
Enhancing Customer Experience with CRM
No ratings yet
Enhancing Customer Experience with CRM
29 pages
Revolutionizing Payments in India
No ratings yet
Revolutionizing Payments in India
8 pages
Wellcare Oil Tools Catalog Overview
100% (1)
Wellcare Oil Tools Catalog Overview
196 pages
Overview of Small Gasoline Engines
No ratings yet
Overview of Small Gasoline Engines
30 pages
CodeMeter Runtime Installation Guide
No ratings yet
CodeMeter Runtime Installation Guide
10 pages
ZTE 2G Handover Algorithm: Why Do We Need Handover ?
No ratings yet
ZTE 2G Handover Algorithm: Why Do We Need Handover ?
37 pages
Enhancing Nigeria's Digital Humanities Capacity
No ratings yet
Enhancing Nigeria's Digital Humanities Capacity
2 pages
ETOM Brochure
No ratings yet
ETOM Brochure
2 pages
Automobile Chassis Course Overview
No ratings yet
Automobile Chassis Course Overview
4 pages
Construction Management
100% (4)
Construction Management
63 pages

Midterm Review: CUDA & GPU Acceleration

Uploaded by

Midterm Review: CUDA & GPU Acceleration

Uploaded by

L13: Review for Midterm

You might also like