In the first part of the blog, we described the problem that we intend to solve, the data gathering, post processing, and generating the final training data.
In the 2nd part, we will take a look at the Machine Learning model we used for training and for inference with new data.
Correlation between various counters
We have captured various counters for various benchmarks. Here is a graph that shows the correlation between each counter with every other counter.
K Neighbors Regression
Given a snapshot of the system as a test data row, the K Neighbors algorithm first finds the K nearest among all neighbors using a distance metric such as Euclidean distance (default), Manhattan Distance, Minkowski distance, etc. It then averages the Y values of the K nearest neighbors for the given test row, and assigns the result as the predicted Y value of the test row.
Standard Normalization of Counters:
In order for the K Neighbors Regression algorithm to calculate these distances in an unbiased manner, we bring all the counters to a comparable scale by using standard normalization. Which means that all the columns will have values that have a standard normal distribution with mean equal to 0 and standard deviation equal to 1.
Why did we use K=1?
Since we know that given two snapshots whose X values are exactly the same, the ratios would also be the same, we chose K=1 to find the closest neighbor whose input variables match the test data very closely to get a nearly accurate prediction of the IPC ratio.
Shown below is a sample of the prediction made using K Neighbors Regression for the IPC Ratio.
The above IPC ratio prediction is for the ‘502.gcc_r_gcc-pp_3’ benchmark. The “Actual” line in the graph is present since we have already calculated the IPC Ratio for ‘502.gcc_r_gcc-pp_3’. This dataframe was excluded from the training data for the K Neighbors Regression and was used as a test dataframe.
The Runtime can be calculated using the predicted IPC by assuming a particular clock speed of the CPU. We calculate the total number of cycles first, followed by the runtime calculation. The following formula can be used:
total_cycles = total_instructions/predicted_ipc
predicted_runtime = total_cycles/(2.5*10^9)
The above formula for predicted runtime assumes that the clock speed of the processor is 2.5 GHz.
The predicted IPC and the runtime for the same benchmark can be seen in the following graph:
It shows around 30% improvement, which is close to the predicted value.
Kommentare