Introduction: Skewed Data Distributions
Distribution sort performs best when the buckets have a similar number of elements assigned to each process rank. In this exercise, you will reuse your code in programming activity #1 to perform distribution sort on exponentially distributed data. Like Programming Activity #1, you will use equal width bins assigned to each process rank. We will study how performance changes due to using a skewed data distribution.
Programming Activity #2
Modify distribution_sort_exponential_starter.c
and incorporate your code from Programming Activity #1 into this file. The only difference between distribution_sort_exponential_starter.c
and distribution_sort_uniform_starter.c
that you used in Programming Activity #1 is that the generated data uses an exponential distribution with $\lambda=4$ instead of a uniform distribution.
Complete the table below using the same instructions outlined in Programming Activity #1.
# of Ranks ($p$) | Total Time (s) | Time to Distribute (s) | Time to Sort (s) | Parallel Speedup | Parallel Efficiency | Global Sum | Job Script Name (*.sh) |
---|---|---|---|---|---|---|---|
1 | |||||||
2 | |||||||
4 | |||||||
8 | |||||||
12 | |||||||
16 | |||||||
20 |
Using the data you collected in your table, answer the following questions. Be sure to refer to specific values in your table when responding to these questions.
- Q8: Does performance differ between the uniformly distributed (Programming Activity #1) and exponentially distributed data? If so, explain the performance discrepancy.