# Allreduce

## Introduction

The Allreduce primitive takes information from all processes in a communicator, performs a reduction using an operator, and then sends the result back to all processes.

The syntax is as follows:

int MPI_Allreduce(const void *sendbuf,
void *recvbuf,
int count,
MPI_Datatype datatype,
MPI_Op op,
MPI_Comm comm)


## Connection to Modules

• This module may be useful for the k-means module.

## Computing the Mean of Values using Weighted Means

Consider the case where you want to compute the mean of several numbers distributed across process ranks and distribute the global mean to all ranks. However, the data elements are distributed amongst several ranks, and the number of values at each rank are unknown. We will compute the mean across all ranks using weighted means.

Let each rank be assigned varying numbers of integers:

• Rank 0 has $S_0=$ {0, 1, 2}
• Rank 1 has $S_1=$ {0, 1, 2, 3, 4}
• Rank 2 has $S_2=$ {0, 1}
• Rank 3 has $S_3=$ {0, 1, 2, 3, 4, 5, 6}

The global mean of these numbers is: 2.0588.

We will compute the mean of these values assuming the following:

• Each process rank does not know how many elements the other process ranks store.

We can compute the weighted means at each rank and then sum them together. To do this, we need to know the total number of elements across all ranks, where the total is: $$|S_0|+|S_1|+|S_2|+|S_3|=3+5+2+7=17.$$

We denote the local weighted means at each process rank as $w_0$, $w_1$, $w_2$, and $w_3$, where

• $w_0=(0+1+2)/17$
• $w_1=(0+1+2+3+4)/17$
• $w_2=(0+1)/17$
• $w_3=(0+1+2+3+4+5+6)/17$

Thus, the global average is the sum of weighted means as follows: $$w_0+w_1+w_2+w_3=2.0588.$$

## Programming Activity:

• Compute the global mean of the numbers which will be stored at each rank.
• To compute the total number of elements above, use an MPI_Allreduce operation with the MPI_SUM operator.
• To compute the sum of weighted means, use an MPI_Allreduce operation with the MPI_SUM operator.
• We use an MPI_Allreduce for the sum of weighted means such that all processes have the value at the end of the computation.
Previous