K-Means: Multiple Nodes

Assessing Scalability on Multiple Nodes

The k-means algorithm requires synchronization at each iteration to exchange information regarding the positions of the centroids. In this exercise we will not program anything new. Rather you will:

  • Execute the algorithm on 2 nodes instead of 1 node.
  • Use $p=24, 28, 32, 36, 40$.
  • Generate the same tables using the same experimental settings in Programming Assignment #1, except use the constraints above.

Include the tables you generated in your report. Answer the following questions using the data collected in the tables.

  • Q7: On two nodes, how does the algorithm scale when $K=2$? Compare with the single node results.

  • Q8: On two nodes, how does the algorithm scale when $K=100$? Compare with the single node results.

  • Q9: Under what conditions do you expect the k-means algorithm to perform well on multiple nodes (e.g., two or more)? When preparing your response, consider the following factors: the number of centroids, the size of the dataset, the data dimensionality, and the number of iterations.

Previous
Next