Range Queries: R-tree Resource Allocation

Introduction: R-Trees on Multiple Nodes

In this exercise, we will not program anything new. Instead, we will experiment with resource allocation.

Splitting Ranks Across Two Nodes

Run the program using the same experimental settings in Programming Assignment #2, except use the constraints below.

Use 2 nodes.
Evenly split the ranks between the nodes.

When running your algorithm, to force the scheduler to split the ranks evenly between nodes you can use the --ntasks-per-node flag. As an example, the line below runs the program for $p=20$, where 10 ranks are run on each node.

srun --nodes=2 --ntasks-per-node=10 -n20 range_query 5000000 50000 ZTF_ra_dec_5m.csv

Table 4: Total response time, index construction time, and search time (two nodes).

# of Ranks ($p$)	Total Time (s)	R-Tree Construction (s)	Search Time (s)	Global Sum	Job Script Name (*.sh)
1
4
8
12
16
20

Table 5: Speedup and parallel efficiency of the search phase from Table 4.

# of Ranks ($p$)	Speedup	Parallel Efficiency
1
4
8
12
16
20

Using the data you collected in your tables, answer the following questions. Be sure to refer to specific values in your table when responding to these questions.

Q6: Compare the speedup and parallel efficiency between Table 3 (one node) and Table 5 (two nodes). All $p=20$ ranks can be run on a single node (Table 3), or they can be evenly distributed between two nodes (Table 5). How does the speedup and parallel efficiency compare? Is there anything interesting about performance?

Devise Your Own Experiment

Based on your observations on 1 and 2 nodes, devise a new experiment that changes resource allocation (e.g., number of nodes, allocation of tasks to nodes, dedicated, non-dedicated, etc.) that may further improve performance beyond using 2 nodes. Complete the tables as follows.

Q7: Explain what parameters you used in your job script. Report the results in the Tables below.

Table 6: Total response time, index construction time, and search time (your new experiment).

# of Ranks ($p$)	Total Time (s)	R-Tree Construction (s)	Search Time (s)	Global Sum	Job Script Name (*.sh)
1
4
8
12
16
20

Table 7: Speedup and parallel efficiency of the search phase from Table 6.

# of Ranks ($p$)	Speedup	Parallel Efficiency
1
4
8
12
16
20

Q8: How does performance compare to your experiment in Table 7 to Tables 3 and 5? Did your new experiment outperform the 2 node experiment?
Q9: Consider the case where you want to run the range query algorithm with the R-tree, and you need to run the algorithm on a cluster that is shared with one other user. Would you rather the other user be running a memory-bound algorithm or compute-bound algorithm? Explain.

Last updated on Jan 8, 2020