0% found this document useful (0 votes)
56 views39 pages

Parallel Computing Communication & Synchronization

The document discusses key concepts in parallel computing, including communication methods (shared memory and message passing), synchronization techniques, and granularity types (coarse and fine). It highlights the importance of observed speedup, efficiency, and scalability in evaluating parallel program performance. Additionally, it covers manual and automatic parallelization methods, emphasizing the trade-offs between control and ease of use.

Uploaded by

ayemenbaig26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views39 pages

Parallel Computing Communication & Synchronization

The document discusses key concepts in parallel computing, including communication methods (shared memory and message passing), synchronization techniques, and granularity types (coarse and fine). It highlights the importance of observed speedup, efficiency, and scalability in evaluating parallel program performance. Additionally, it covers manual and automatic parallelization methods, emphasizing the trade-offs between control and ease of use.

Uploaded by

ayemenbaig26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

🔁 1.

Communications in Parallel Computing


📌 What is it?
In parallel programs, tasks (or threads/processes) often need to share or exchange data
with each other. This exchange is called communication, no matter how it's done.

📦 Two Main Methods:


1.​ Shared Memory – All tasks access a common memory space (like multiple people
using the same whiteboard).​

2.​ Message Passing (Network) – Tasks send messages to each other over a network
(like texting between phones).​

✅ Key Point: No matter if it’s shared memory or network — data exchange is


called communication.

🕒 2. Synchronization
📌 What is it?
Synchronization means making sure all tasks are in step with each other, especially
when they need to wait for others to reach a certain point before moving on.

🧠 Think of it like:
A group of runners who are only allowed to move to the next lap when all
members finish the current lap — they must sync up.

🛠️ How it’s implemented:


●​ Barriers: A point where all tasks must wait until every other task has reached it.​

●​ Locks / Semaphores: Used to control access to shared resources.​

⚠️ Why They Matter:


Term Purpose Problem If Ignored
Communicatio Share data between tasks Incomplete or incorrect results
n

Synchronizatio Ensure tasks proceed together Race conditions, deadlocks,


n logically inconsistent data

✍️ Scenario-Based Question (Exam Style):


Q: In a parallel matrix multiplication program, each thread calculates a part of the matrix.
Once all threads finish, they combine the results.​
Which concepts are being used here?

✅ Communication – Threads need to send their computed parts to a shared space.​


A:​

✅ Synchronization – Threads must wait until all are done before combining results.

⚙️ Granularity in Parallel Computing


📌 What is it?
Granularity is about how much computation happens before a task needs to communicate
with others.

It’s the ratio of:

Computation Time÷Communication Time\text{Computation Time} \div \text{Communication


Time}Computation Time÷Communication Time

🧱 Two Types of Granularity:


Type Meaning Example

Coarse-Gra Lots of computation happens before A thread works on a full image


ined communication (less communication) block before sending data

Fine-Grain Very frequent communication (small tasks Threads share results after
ed keep sharing data often) every few calculations

✅ Key Points for Exam:


●​ Coarse granularity → better performance (less communication overhead)​

●​ Fine granularity → more communication, more overhead → can slow down


performance​

●​ Ideal granularity depends on your task and system architecture​

⚡ Observed Speedup
📌 What is it?
Observed speedup tells us how much faster a parallel program runs compared to the serial
(single processor) version.

Observed Speedup (S)=Wall-clock time (Serial)Wall-clock time (Parallel)\text{Observed


Speedup (S)} = \frac{\text{Wall-clock time (Serial)}}{\text{Wall-clock time (Parallel)}}Observed
Speedup (S)=Wall-clock time (Parallel)Wall-clock time (Serial)​

🧠 Simple Example:
●​ Serial time = 20 seconds​

●​ Parallel time (on 4 processors) = 6 seconds​

S=206≈3.33S = \frac{20}{6} \approx 3.33S=620​≈3.33

✅ So, the parallel program is about 3.33 times faster than the serial one.

💡 Key Insights:
●​ Ideal speedup = number of processors​

○​ If using 4 processors → ideal speedup = 4​

●​ But in real life, speedup is usually less due to communication, synchronization, and
sequential parts (Amdahl’s Law!)​

●​ Still, observed speedup helps measure how good your parallelization is​
📝 Scenario-Based Question:
Q: You run a parallel sorting algorithm. The serial version takes 40 seconds, and the parallel
version (with 8 processors) takes 10 seconds. What is the observed speedup? What type of
granularity would be ideal here?

A:

●​ Observed Speedup = 40 / 10 = 4​

●​ For sorting (a compute-heavy task), coarse granularity is ideal to reduce


communication overhead.

🧵 Fine-Grain Parallelism – Simple Explanation


📌 What it means:
The program breaks tasks into very tiny parts, so processors need to
communicate very often.

🧠 In Simple Words:
Imagine doing group homework, but after every single sentence, you stop and show your
friend to approve before writing the next one.

●​ That’s fine-grained parallelism — lots of checking (communication), very little


writing (computation).​

📊 Characteristics of Fine-Grain Parallelism:


Feature Explanation

Small computation Each task does a tiny bit of work before syncing

High communication Tasks need to talk frequently


Low compute-to-comm Communication happens almost as much (or more) than
ratio computation

High overhead Communication and synchronization slow things down

Low performance gain Speedup is limited due to frequent "talking"

⚠️ Drawback:
If the tasks are too small, processors spend more time talking and waiting
than actually working → leading to slower performance.

📝 Quick Example:
You divide a big matrix into very tiny 2x2 blocks for multiple processors to work on.

●​ After every small operation, processors sync up and exchange data.​

●​ This constant stopping kills performance.​

✅ Instead, it would be better to use coarser granularity — bigger blocks, less


communication.

🧱 Coarse-Grain Parallelism – Simple Explanation


📌 What it means:
Each processor does a big chunk of work before needing to communicate with
others.

🧠 In Simple Words:
Imagine you're cleaning your room with your siblings.

●​ You clean your entire area first, and only talk when you're done.​

●​ That’s coarse-grain parallelism — more working, less talking ✅​


📊 Characteristics of Coarse-Grain Parallelism:
Feature Explanation

Large computation blocks Each processor works longer before needing to


sync

Less frequent Communication happens only occasionally


communication

High compute-to-comm ratio More time spent working, less time talking

Low overhead Not much time wasted on syncing

Better performance Because less time is wasted on communication

❌ Harder load balancing Some processors might get more work than others

⚠️ Drawback:
If one processor finishes early, it might wait for others — this makes load
balancing a bit tricky.

✅ When to Use:
●​ Tasks that can be split into independent large pieces​

●​ Sorting, image processing, simulations​

📝 Quick Example:
●​ Dividing a 1000x1000 image into 4 large chunks for 4 processors.​

●​ Each works on its own part without talking.​

●​ Only once at the end, they combine the results.​

This is coarse-grain and generally faster than fine-grain because less time is spent on
communication.
\
Problems that increase the percentage of parallel time with their size are more scalable,
than problems with a fixed percentage of parallel time
✅ Why Use Parallel Processing? (Easy Version)
1. 🕒 Save Time (Reduce Wall-Clock Time)
●​ Parallel programs run tasks at the same time.​

●​ This finishes the job faster than doing everything one by one (serially).​

●​ Example: Sorting a big dataset using 8 CPUs instead of 1.​

2. 📈 Solve Bigger Problems


●​ Some problems (like weather modeling or simulations) are too large for one
computer.​

●​ Parallel processing splits the big problem across many processors to make it
manageable.​

3. 💾 Overcome Memory Limits


●​ One machine might not have enough RAM to load huge data.​

●​ Using multiple nodes shares memory resources → larger memory pool!​

4. 💰 Cost Savings
●​ Parallel systems (like GPU clusters or cloud compute) can be more cost-effective
than supercomputers.​

●​ Also allows flexibility — pay for only what you use (cloud model).​

5. 🧯 Better Fault Tolerance


●​ If one processor fails, others can continue the task or take over.​

●​ Makes systems more robust — good for critical applications (like servers or medical
simulations).​
6. 🧪 Scientific Curiosity / Research
●​ Many scientific fields (AI, physics, genomics) require massive computations.​

●​ Parallel processing makes these computations possible and faster!​

✍️ One-Line Exam Answer:


"Parallel processing is used to save time, solve larger problems, utilize
memory better, save cost, handle failures, and enable advanced scientific
computing."

🌟 Title: Other Metrics for Performance Evaluation


This means:​
When you run a program (especially on multiple processors), how do you measure how
good or fast it is?

🔹 (Run-time is the dominant metric)


👉
That means the most important thing is:​
How much time does it take to finish?​
This is called Execution Time or Run-Time.

🚀 Metrics Explained:
1. Run-Time (Execution Time)

👉 How long the program takes to finish.​


Shorter time = Better performance.

2. Speed (mflops, mips)


●​ mflops = Million floating point operations per second.​

🚗.​
●​ mips = Million instructions per second.​
These show how fast your system is working, like a speedometer

3. Speedup

👉 How much faster your program runs when you use multiple processors.​
It’s calculated like this:

sql
CopyEdit
Speedup = Time using 1 processor / Time using multiple processors

Example:​
If 1 processor takes 10 seconds, and 4 processors take 2.5 seconds,​
Speedup = 10 / 2.5 = 4×

4. Efficiency

This tells you how well you're using your processors.

🧠 Formula:
E=SpeedupNumber of ProcessorsE = \frac{\text{Speedup}}{\text{Number of
Processors}}E=Number of ProcessorsSpeedup​

If you have 4 processors and get a speedup of 4:​


Efficiency = 4 / 4 = 1 or 100% (perfect)​
If speedup is 2 with 4 processors:​
Efficiency = 2 / 4 = 0.5 or 50%

So, higher efficiency = better use of processors!

5. Scalability

👉 How well your program performs as you add more processors.​


Good scalability = Performance keeps improving with more processors.
✅ Summary (Quick Version)
●​ Run-Time: Total time to finish.​

●​ Speed: Operations per second.​

●​ Speedup: How much faster with multiple processors.​

●​ Efficiency: How well processors are used.​

●​ Scalability: How well it handles more processors.

🧠 General Parallel Terminologies (Simplified)


1. 🔁 Parallel Overhead
This is the extra time spent on managing parallel tasks instead of doing the actual work.

⚙️ It includes:
●​ Task startup time​

●​ Data communication between tasks​

●​ Synchronization (waiting for other tasks)​

●​ Software delays (caused by compilers, OS, etc.)​

●​ Task termination time​

🧠 Think of it like a group project: If you spend more time planning, calling, and emailing
each other than doing the actual work — that's overhead.

2. 💥 Massively Parallel
Refers to a computer system that has hundreds or thousands of processors working
together.

✅ Example: Supercomputers like those used for climate simulation or deep learning.
3. 📈 Scalability
How well a parallel system can handle more work when more processors are added.

🧩 Depends on:
●​ CPU-memory connection speed (bandwidth)​

●​ Network communication speed​

●​ Your algorithm (is it parallel-friendly?)​

●​ The overhead of managing multiple tasks​

📌 If performance keeps improving as you add more CPUs = system is scalable.

✍️ Exam Tip:
"Parallel overhead is the time spent coordinating parallel tasks. Massively
parallel systems use 100s or 1000s of processors. Scalability is how well a
system improves as we add more processors."

✅ Factors That Affect Scalability (Easy Explanation)


Scalability = How well a system performs when you add more processors. These are the
key things that affect it:

🔧 1. Hardware
●​ Especially the bandwidth between CPU and Memory.​

●​ Also includes network speed between processors (if they're on different machines).​

🧠 Think of it like highways: More cars (processors) only help if roads (bandwidth) are
wide enough.

📊 2. Application Algorithm
●​ Some algorithms are easy to split into tasks (parallel-friendly) → great scalability.​
●​ Others are mostly serial → poor scalability.​

💡 Example:
●​ Sorting large lists = good​

●​ Recursive dependency problems = not so good​

⏱️ 3. Parallel Overhead
●​ Extra time spent on managing tasks, synchronizing, communicating instead of
doing the real work.​

●​ More overhead = less scalability​

🧑‍💻 4. Your Specific Code & Application


●​ How you write and structure your parallel code.​

●​ Smart coding = better use of hardware = better scalability.​

💬 Summary:
Scalability depends on how fast the hardware is, how parallel your algorithm is,
how much time is wasted in communication, and how well you write your code.
Manual Parallelization (Detailed)

●​ You decide what part of the code should run in parallel.​

●​ You use tools/libraries like:​

○​ Threads (e.g., POSIX, Java threads)​

○​ OpenMP, MPI, etc.​

●​ 🔧 Example: You manually create 4 threads to sort different parts of an array.​


Pros: Full control​
Cons: Hard to manage, easy to make mistakes

🤖 Automatic Parallelization
●​ A parallelizing compiler or pre-processor analyzes your code.​

●​ It detects loops or parts of the code that can safely run in parallel.​

●​ Tools:​

○​ Intel Compilers (ICC)​

○​ LLVM/Clang with auto-parallelization flags​

○​ Some Python libraries or MATLAB backends​


Pros: Fast and easy​
Cons: Not always smart enough to parallelize complex logic

📝 Summary:
Manual = More control, more work.​
Automatic = Easier, but not always efficient.

🧠 How a Parallelizing Compiler Works


A parallelizing compiler helps convert your normal (serial) code into parallel code — either
automatically or with your help. There are two main modes:

1. 🔄 Fully Automatic Parallelization


●​ What happens?​
The compiler automatically scans your code and finds parts (usually loops) that
can run in parallel.​

●​ Target areas:​

○​ Loops (for, do, etc.)​

○​ Independent computations (no data dependency)​

●​ Examples of tools:​

○​ Paralax compiler​

○​ Insieme compiler​

●​ Pros:​

○​ No need to modify your code​

○​ Great for beginners or simple programs​

●​ Cons:​
○​ May miss some opportunities​

○​ Only works well when the code is structured clearly​

2. 🧾 Programmer-Directed Parallelization
●​ What happens?​
The programmer gives hints or instructions to the compiler using directives
(special comments or flags).​

●​ How?​

○​ You add compiler directives like #pragma to tell the compiler how to
parallelize.​

○​ Examples:​

■​ #pragma omp parallel for → tells compiler to run loop in


parallel (OpenMP)​

■​ #pragma acc parallel loop → for OpenACC (used with GPUs)​

●​ Examples of tools:​

○​ OpenMP​

○​ OpenACC​

●​ Pros:​

○​ More control over what gets parallelized​

○​ Works even with more complex code​

●​ Cons:​

○​ Requires programmer knowledge​

○​ May lead to bugs if used incorrectly​

📝 Summary Table
Feature Fully Automatic Programmer-Directed

Who identifies parallelism? Compiler You (with hints)

Code changes needed? No Yes (add directives)

Flexibility Low High

Examples Paralax, Insieme OpenMP, OpenACC

🔧 Why Shift to Manual Parallelization?


Since automatic tools have these drawbacks, for real-world, high-performance
applications, developers usually prefer the manual approach.

✅ Manual Parallelization:
●​ Gives full control over how the work is split and synchronized.​

●​ Lets you optimize communication and memory use.​

●​ Works better for complex or irregular code structures.​


●​ Can result in higher speedup and efficiency when done properly.​

📝 Final Transition Note:


From this point on, the lecture focuses on manual parallelization, where
you—the programmer—take control over designing and implementing parallel
tasks.
🔑 Step 1: Understanding the Problem (Before
Parallelizing)
Before jumping into writing parallel code, the first and most critical step is to fully
understand the problem and the existing serial program. Here's why and how:

✅ Why It’s Important:


1.​ Not all problems benefit from parallelization.​

○​ Some problems have too much sequential dependency.​

○​ Others might not be worth the overhead of parallel processing.​

2.​ You can’t optimize what you don’t understand.​

○​ Without knowing the program flow, data dependencies, and bottlenecks, any
attempt at parallelization could lead to:​

■​ Incorrect results ❌​
■​ Wasted time ⌛​
■​ Poor performance 🐌​

📌 What You Should Understand:


●​ What the program does (its logic, input, output).​

●​ Which parts take the most time (profiling helps here).​

●​ Where data dependencies exist.​

●​ Which sections can run independently (ideal candidates for parallelism).​

●​ Memory usage and potential for data sharing or race conditions.​

🤔 Is Your Problem Suitable for Parallelization?


Ask these questions:

●​ Is the problem compute-intensive?​

●​ Can parts of the computation be done independently?​

●​ Is the problem data-parallel? (same operation on different data chunks)​

●​ Is the amount of work large enough to justify parallel overhead?​

Example:

Let’s say you're working on image processing (e.g., applying a filter to every pixel).

✅ Each pixel can be processed independently → good for parallelization.​


❌ But if you’re working on a recursive depth-first search, parallelization is more
challenging due to sequential dependencies.

🎯 Identify the Program's Hot-Spots


Once you understand the problem and the serial code, the next key step is to identify the
hot-spots—the parts of the code that do the most computational work or take the most
time.

🔍 What Are Hot-Spots?


Hot-spots are the sections of code where:

●​ The CPU spends most of its time.​

●​ Most computations or data operations occur.​

●​ Optimizing them would have the biggest impact on performance.​

🛠️ How to Find Them?


You can’t just guess—use profiling tools:
✅ Tools for Profiling:
●​ gprof (Linux)​

●​ Valgrind with callgrind​

●​ perf​

●​ Intel VTune​

●​ Visual Studio Profiler (Windows)​

●​ Python: cProfile or line_profiler​

●​ MATLAB: built-in profiler​

●​ Jupyter Notebooks: %timeit, %prun​

📈 What to Look For:


●​ Functions with highest CPU usage​

●​ Loops with long execution time​

●​ I/O bottlenecks (if any)​

●​ Memory access patterns (e.g., cache misses)​

🎯 Focus Efforts Wisely


“Don’t waste time parallelizing code that’s only used 1% of the time.”

✅ Instead:
●​ Prioritize the parts of the code where optimization can produce the most speedup.​

●​ Apply Amdahl’s Law: If 90% of your program is parallelizable, you can get a 10x
potential speedup in theory.​
🧠 Example:
If a program has:

●​ 3 nested loops for matrix multiplication that take 80% of the total runtime​

●​ And a few print statements and setup code taking the rest​

👉 Focus only on parallelizing the matrix multiplication section.

🔧 Identify Bottlenecks in the Program


After spotting the hot-spots, the next step is to identify bottlenecks—the parts of your
program that slow everything else down or block parallel progress.

🔍 What Are Bottlenecks?


A bottleneck is:

A section of code that limits the performance of the entire program because it
takes too long or blocks other tasks from running.

❗ Examples of Bottlenecks:
Bottleneck Type Description

🧾 I/O Operations Reading/writing to disk is very slow compared to


computation.

🕒 Serial Sections Parts that can’t be parallelized (e.g., setting up data


structures).

🔁 Poorly optimized Nested loops or unoptimized algorithms that slow down


loops computation.

🌐 Communication In parallel programs, time spent waiting for data from other
delays processes.

🔄 Synchronization wait One thread waits for another at a barrier or lock.


🛠️ How to Identify Bottlenecks?
Use:

●​ 🔬 Profilers (like gprof, perf, VTune) to find slow code.​


●​ 📊 Timing functions to measure each part (e.g., [Link]() in Python).​
●​ 📦 Memory + CPU usage tools to monitor resources.​
●​ 🧠 Logic review: Are you doing extra unnecessary work?​

💡 How to Reduce or Eliminate Bottlenecks?


Problem Solution

🐢 Slow I/O Buffer I/O, use parallel file reading, or reduce file operations

🧱 Sequential code Refactor or break into smaller parallel pieces if possible

🔁 Slow algorithms Use better data structures or faster algorithms

🌐 Comm overhead Try to overlap communication with computation

🔄 Sync delays Reduce the need for frequent synchronization/barriers

🔄 Overlapping Communication with Computation


This is key in parallel computing:

Start computing with the available data, while other data is still being
received/transferred in the background.

Example: While one thread is sending data to another, let it process already received data
at the same time.

🧠 Summary:
●​ 🕵️ Identify slowdowns → are they unavoidable or fixable?​
●​ 🎯 Focus on fixing the most impactful ones.​
●​ 🔄 Try to hide latency using smart scheduling or overlap techniques.
🔍 Other Key Considerations in Parallel Programming
🧱 1. Identify Blockages to Parallelism
Even if a task seems parallelizable, some hidden issues can block it. One of the biggest
culprits is:

🔁 Data Dependence
This happens when:

One part of your code depends on the result of another part before it can
continue.

⚠️ Types of Data Dependencies


Type Meaning

📥 Read-after-Write A task needs data produced by another task.


(RAW)

📝 Write-after-Read A task tries to overwrite data before another task has finished
(WAR) reading it.

✍️ Write-after-Write Two tasks try to write to the same variable at the same time.
(WAW)

🧠 These prevent tasks from running in parallel because order matters.

🛠️ How to Handle It?


●​ Restructure the code to remove or reduce dependencies.​

●​ Use local copies of variables (avoid shared states).​

●​ Apply synchronization carefully (mutexes, barriers).​

●​ Explore task scheduling strategies to reorder execution safely.​


🔄 2. Investigate Alternative Algorithms
Sometimes the current algorithm is the bottleneck. Instead of trying to force it into a
parallel model:

💡 Try a different algorithm that's naturally parallel.

🧠 Why Is This Important?


Because:

●​ Some algorithms are easier to split across processors.​

●​ They may offer better scalability and less communication overhead.​

●​ A simple algorithm switch can give huge performance improvements.​

✅ Examples:
Task Traditional Algorithm Better Parallel Alternative

Matrix multiplication Naive nested loops Blocked/Strassen’s multiplication

Sorting QuickSort (recursive) MergeSort / Parallel QuickSort

Searching large Linear search Divide-and-conquer, MapReduce style


datasets

Graph traversal DFS/BFS Parallel BFS with level synchronization

🔚 Summary:
●​ 🔎 Look for data dependencies that block parallelism.​
●​ 🧠 Consider algorithm changes — don’t stick to serial logic!​
●​ 🎯 A well-chosen algorithm can unlock massive parallel performance.
🧩 Partitioning in Parallel Programming
Before you can run a program in parallel, you need to divide the work so multiple
processors can work on different parts simultaneously. This division is called
partitioning or decomposition.

Why Partition?

Because...

🚀 You can’t parallelize what you haven’t split up!

🛠️ Two Main Types of Partitioning:


1. Domain Decomposition (Data-Based)

Also called Data Parallelism

You divide the data into chunks, and each processor works on a subset of the data.

📌 Example:
●​ Suppose you're processing a big image (e.g., applying a filter).​

●​ You can divide the image into sections and assign each section to a different
processor.​

✅ Best for:
●​ Tasks where the same operation is performed on different data (like matrix
operations, simulations, etc.)​

2. Functional Decomposition (Task-Based)

Also called Task Parallelism

You divide the work based on functions or tasks, not data. Each processor performs a
different function or step in the process.
📌 Example:
●​ Suppose you're simulating weather:​

○​ One task handles temperature.​

○​ Another handles pressure.​

○​ Another handles wind speed.​

○​ All tasks work in parallel on the same data.​

✅ Best for:
●​ Workflows with distinct stages or responsibilities.​

●​ Pipelines and heterogeneous tasks.​

🔁 Summary:
Type Based Example Use When...
On

Domain Data Split matrix/image/data Same operation on lots of


Decomposition array data

Functional Function One task computes, one Different tasks on shared


Decomposition s writes or same data

🗂️ Domain Decomposition (Data-Based Partitioning)


💡 Concept:
You split the data into parts, and each parallel task (processor/thread) works on its own
part of the data.

✅ Key Points:
●​ All tasks perform the same operation.​
●​ Each task only works on a specific chunk of the overall dataset.​

●​ Ideal when your data can be evenly split and worked on independently.​

📊 Example: Matrix Addition


Imagine you have two 4×4 matrices A and B, and you want to compute C = A + B.

🧠 Serial:
One processor adds each element one by one.

⚡ Parallel (Domain Decomposition):


Split the matrix into 4 rows. Assign 1 row to each of 4 processors:

Processo Data Worked On (Rows)


r

P0 Row 0

P1 Row 1

P2 Row 2

P3 Row 3

Each processor does:

cpp
CopyEdit
C[i][j] = A[i][j] + B[i][j] // for its assigned rows

🧱 Real-Life Analogy:
Imagine painting a wall — you split the wall into 4 sections and 4 painters work in parallel,
each on their own section.

💻 Use Cases:
●​ Image processing​
●​ Scientific simulations​

●​ Weather modeling​

●​ Matrix operations​

🧩 Partitioning Data in Parallel Computing


When we talk about partitioning data, we're basically figuring out how to split the dataset
so multiple processors can work on different parts at the same time.

📐 Common Ways to Partition Data:


1. 1D Partitioning (One-Dimensional)

You divide the data along one dimension only.

📊 Example: Vectors or Rows of a Matrix


Suppose you have a 4×4 matrix:

lua
CopyEdit
A = [[1, 2, 3, 4],
[5, 6, 7, 8],
[9,10,11,12],
[13,14,15,16]]

Using 1D partitioning, you could assign:

Processo Rows
r Assigned

P0 Row 0

P1 Row 1

P2 Row 2

P3 Row 3
Each processor works on a full row.

2. 2D Partitioning (Two-Dimensional)

You divide the data in both rows and columns — a grid-style split.

📊 Same 4×4 Matrix — but split into blocks:


Processo Block Assigned
r

P0 Top-left (2×2)

P1 Top-right (2×2)

P2 Bottom-left (2×2)

P3 Bottom-right (2×2)

Each processor works on a sub-matrix.

📌 Summary:
Type How It Works Good For

1D Split rows or columns Simple, easier to manage

2D Split into blocks (grid) Better load balance for big 2D data like matrices or
images

⚙️ Functional Decomposition in Parallel Computing


Instead of breaking up the data, functional decomposition breaks up the tasks (functions or
operations) that need to be performed.

💡 Simple Definition:
"Split the work based on what needs to be done, not on what data needs
to be handled."
🔍 How it works:
Each processor (or thread) is given a different function or part of the algorithm to
execute.

🛠️ Example:
Let's say you're modeling an ecosystem simulation:

●​ 🐦 Processor 1 handles bird population growth​


●​ 🌳 Processor 2 handles plant growth​
●​ 🐺 Processor 3 handles predator-prey dynamics​
Each one does a different task, but together they simulate the entire ecosystem.

✅ When to Use Functional Decomposition:


●​ When a problem involves distinct operations or stages.​

●​ When those operations can run independently or in parallel.​

🧠 Common Use Cases:


Application Tasks (Functions)

🌱 Ecosystem Model Animal growth, plant cycles, weather

🎧 Signal Processing Noise filtering, amplification, encoding

🌍 Climate Modeling Wind simulation, ocean currents, radiation


🔹 Who Needs Communications?
You don’t need communication when:

●​ The tasks in a parallel program are independent of each other.​

●​ There's no data dependency between them.​

●​ Each task can complete its part without needing to know what the others are doing.​

Example – Image Processing:

●​ Imagine a black and white image where you want to invert each pixel.​

●​ Each pixel operation is independent (e.g., new_value = 255 - old_value).​

●​ You can assign chunks of pixels to different processors—no need to talk to each
other.​

Such problems are called: 👉 Embarrassingly Parallel Problems – because they’re so


easy to parallelize!

🔹 Factors to Consider: Cost of Communication


Whenever parallel tasks communicate, there's a performance price to pay:

💸 1. Overhead
●​ Communication uses CPU cycles, memory, and network bandwidth.​

●​ These resources could otherwise be used for actual computation.​

⏱ 2. Waiting Time (Synchronization Delays)

●​ Tasks often need to wait for others before moving forward.​


●​ This happens during data exchange, creating idle time (which is wasteful).​

🌐 3. Network Bandwidth Saturation


●​ When too many tasks communicate at once, it can clog the network.​

●​ Like rush-hour traffic — everyone slows down.​

●​ Result? Even well-optimized programs lose performance due to congestion.​

🔁 Real-World Analogy
Think of a group of workers trying to build a wall:

●​ If they talk constantly about each brick placement, they'll build slowly.​

●​ If they only coordinate occasionally (less communication), they work faster.​

●​ But if they never communicate, mistakes happen.​

📝 Scenario-Based Question
Q:​
You're running a parallel program with 16 processors. Each processor frequently needs to
exchange boundary values with its neighbors. However, as the number of processors
increases, your program's speedup starts decreasing. Why is this happening?

A:

●​ The increase in inter-processor communication introduces more overhead.​

●​ Synchronization delays cause processors to wait for others to exchange data.​

●​ Network bandwidth may be saturated, reducing the overall efficiency.​

●​ The program is spending more time communicating than computing.

You might also like