Introduction
A fundamental database operation that is used in many data analysis subroutines is range queries. Range queries find all objects within a lower and upper bound in each dimension. In this module, we will implement parallel range queries using the brute force approach and using an indexing data structure, the Rtree. We will compare performance and examine how performance varies using each algorithm.
Student Learning Outcomes
 Understand the scalability of the brute force and Rtree range query algorithms.
 Compare performance of brute force and Rtree range query algorithms.
 Discover how computebound and memorybound algorithms vary in their scalability.
 Understand the impacts of resource allocation on Rtree algorithm performance.
 Based on foundational knowledge, devise a new resource allocation strategy that may improve performance.
 Synthesize the experimental data to assess what may happen when resources are shared by multiple users.
Prerequisites
Some knowledge of the MPI API and familiarity with the SLURM batch scheduler.
MPI Primitives Used
 MPI_Reduce
Files Required
Optional Files
Problem Description
An illustrative example of using range queries in the context of a database of movies is as follows. Consider a range query that finds all movies from June 1, 2010–July 1, 2011 that have a rating between 7.1–9.1. In this example, we will use the time since epoch to represent dates (given in seconds since Jan 1, 1970). June 1, 2010–July 1, 2011 correspond to the following time ranges 1275375600–1309503600. Figure 1 plots the data in this database, showing the release date vs. movie rating. A red rectangle outlines the abovementioned range query. From the plot, we observe that 3 movies match the range query criteria.
Let $D$ be the total number of movies, and $d$ be the number of dimensions. In the movie database example, $d=2$. To compute a single range query, each data point must be compared to the ranges in each dimension to determine if the point lies within the ranges. Therefore, using a brute force approach, the complexity for a single range query is $O(D)$.
Programming Activity #1
Write a distributedmemory range query algorithm using the brute force method.

Use and modify the starter file:
range_query_starter.c
. 
Compile all programs with the O3 optimization flag.

Use a single node.

Use the exclusive flag for all experiments so that your results are not polluted by other users running on the node.

The starter file will generate input and query datasets at each rank.

The input dataset, $D$, will be replicated across all ranks.

The query dataset, $Q$, will assign $\lfloor Q/p\rfloor$ queries to each rank. Since we take the floor of $Q/p$, then we do not need to consider the case where there are “leftover” queries to search; therefore, we may search slightly fewer than $Q$ queries if $Q~mod~p~!=0$.

The query dataset that is generated varies between ranks.

The value of a data point in the dataset in dimension $j$ denoted as $p_j \in D$ lies within the range of dimension $j$ if $r_j^{lower}\leq p_j\leq r_j^{upper}$, where $r_j^{lower}$ and $r_j^{upper}$ are the lower and upper bounds of the range in dimension $j$. Make sure to check the logic of your comparison operators.

The total size of $D$ and $Q$ will be given on the command line.

Q1: Excluding the fact that the algorithm is brute force, what is one potential inefficiency described in the programming activity?
Validation
 You may want to make your own example dataset to test that the logic of your range queries is correct. The data used to make Figure 1 is in the file
movie_data.txt
.  For each range query, you will store the number of hits for that query in an array (i.e., the number of data points within the range). For example, the range query in Figure 1 has 3 hits. After timing the computation, you will add up the values of this array to compute the local sum, then perform a reduction to compute the global sum at rank 0 (be sure to use a suitable data type for this value in your program). Output the global sum.
 The global sum for $p=20$ is 467,634,204 (for the programming activity, not the movie dataset).
In your report, include tables with the following information for executions with $p=1, 4, 8, 12, 16, 20$ process ranks, with $D=2000000$ and $Q=100000$. You will create a table showing the total response time, speedup, parallel efficiency, and the global sum for validation purposes. The total response time is the time needed to compute the range queries on the longest executing rank. Use a reduction to compute the total response time.
Include all job scripts in the submitted archive. The job script may be the same for all executions. Use a single dedicated node for all experiments. An example template table is given below.
Table 1: Total response time, speedup, and parallel efficiency (one node).
# of Ranks ($p$)  Time (s)  Speedup  Parallel Efficiency  Global Sum  Job Script Name (*.sh) 

1  
4  
8  
12  
16  
20 
Using the data you collected in your table, answer the following questions. Be sure to refer to specific values in your table when responding to these questions.
 Q2: Describe the performance of the brute force algorithm, does it scale well? Explain.