At Whileone we take pride in customer's success. We help customers achieve goals and execute out of the box ideas that are necessary for success.
One such project was to get IPCs for cloud applications on different architectures, completely omitting network stack. This would give the RISC-V chip designing customer a good picture whether architecture IPC ( Instructions per Cycle ) is inline with competition like Intel or ARM.
To achieve this, we modified cloud applications to profile and benchmark the performance with no network or socket calls. The idea here was to see performance of different architectures with vanilla versions and lite ( modified ) versions. This would help the customer run these applications on their simulator. This would help them to get the IPC number of that architecture for that application and compare it with the competition.
To give you an example, one of the applications we picked was Redis -a cached server application. Redis takes SET/GET requests from clients and processes those internally to keep cached copy for quick response. To get away with the network part, we simulated the client and to look like Redis has N SET/GET requests and processed those. Now the performance numbers we have are solely for that application processing on that architecture. This helped eliminate network noise and get a good picture of what the IPC is for core application processing.
Table below shows the IPC Redis vs RedisLite. Drop in IPC can be attributed to networking sockets being removed.
SET | Redis | Redis-Lite | |
Graviton2 | Intel 8275 Cascade Lake | Graviton2 | |
IPC | 0.94 | 0.69 | 1.76 |
Icount / packet | ~39000 | ~30000 | ~20200 |
In doing so, we made sure that we do not modify program logic and core behavior of the application in any way. We could see a similar call stack in case of Redis and Redis Lite. Below are the snapshots.
REDIS Flamegraph
REDIS-LITE Flamegraph
As evident from the flame graphs- the call stack of the core application is not altered. In the Redis-lite flamegraph, the network component is absent.
Redis is a single threaded application. We helped the customer port various multi-threaded / multi-process applications. The customer was able to cross-compile these application and run it on the it’s RISC-V simulator. This was an interesting experiment from the performance numbers point of view and useful for the customer in the early phase of chip development. This helped the customer to understand where they are placed with respect to the competition.
コメント