K-Means: Multiple Nodes

Assessing Scalability on Multiple Nodes

The k-means algorithm requires synchronization at each iteration to exchange information regarding the positions of the centroids. In this exercise we will not program anything new. Rather you will:

Execute the algorithm on 2 nodes instead of 1 node.
Use $p=24, 28, 32, 36, 40$.
Generate the same tables using the same experimental settings in Programming Assignment #1, except use the constraints above.

Include the tables you generated in your report. Answer the following questions using the data collected in the tables.

Q7: On two nodes, how does the algorithm scale when $K=2$? Compare with the single node results.
Q8: On two nodes, how does the algorithm scale when $K=100$? Compare with the single node results.
Q9: Under what conditions do you expect the k-means algorithm to perform well on multiple nodes (e.g., two or more)? When preparing your response, consider the following factors: the number of centroids, the size of the dataset, the data dimensionality, and the number of iterations.

Last updated on Jan 8, 2020