Assessing Scalability on Multiple Nodes
The k-means algorithm requires synchronization at each iteration to exchange information regarding the positions of the centroids. In this exercise we will not program anything new. Rather you will:
- Execute the algorithm on 2 nodes instead of 1 node.
- Use $p=24, 28, 32, 36, 40$.
- Generate the same tables using the same experimental settings in Programming Assignment #1, except use the constraints above.
Include the tables you generated in your report. Answer the following questions using the data collected in the tables.
-
Q7: On two nodes, how does the algorithm scale when $K=2$? Compare with the single node results.
-
Q8: On two nodes, how does the algorithm scale when $K=100$? Compare with the single node results.
-
Q9: Under what conditions do you expect the k-means algorithm to perform well on multiple nodes (e.g., two or more)? When preparing your response, consider the following factors: the number of centroids, the size of the dataset, the data dimensionality, and the number of iterations.