top of page
Writer's pictureRajeev Gadgil

Network compute agnostic Performance Analysis for Cloud workloads

Updated: Dec 11


Introduction

At Whileone we take pride in customer's success. We help customers achieve goals and execute out of the box ideas that are necessary for success.


Understanding IPC

One such project was to get IPCs for cloud applications on different architectures, completely omitting network stack. This would give the RISC-V chip designing customer a good picture whether architecture IPC ( Instructions per Cycle ) is inline with competition like Intel or ARM. 


Modifying Cloud Applications

To achieve this, we modified cloud applications to profile and benchmark the performance with no network or socket calls. The idea here was to see performance of different architectures with vanilla versions and lite ( modified ) versions. This would help the customer run these applications on their simulator. This would help them to get the IPC number of that architecture for that application and compare it with the competition.


Example with REDIS

To give you an example, one of the applications we picked was Redis -a cached server application. Redis takes SET/GET requests from clients and processes those internally to keep cached copy for quick response. To get away with the network part, we simulated the client and to look like Redis has N SET/GET requests and processed those. Now the performance numbers we have are solely for that application processing on that architecture. This helped eliminate network noise and get a good picture of what the IPC is for core application processing.

Table below shows the IPC Redis vs RedisLite. Drop in IPC can be attributed to networking sockets being removed.



SET

Redis

Redis-Lite


Graviton2

Intel 8275 Cascade Lake

Graviton2

IPC

0.94

0.69

1.76

Icount / packet

~39000

~30000

~20200


In doing so, we made sure that we do not modify program logic and core behavior of the application in any way. We could see a similar call stack in case of Redis and Redis Lite. Below are the snapshots.


Flamegraph1

Flamegraph2

Performance Metrics

Call stack analysis

Multi-threaded applications

REDIS Flamegraph


REDIS-LITE Flamegraph


As evident from the flame graphs- the call stack of the core application is not altered. In the Redis-lite flamegraph, the network component is absent. 


Redis is a single threaded application. We helped the customer port various multi-threaded / multi-process applications. The customer was able to cross-compile these application and run it on the it’s RISC-V simulator. This was an interesting experiment from the performance numbers point of view and useful for the customer in the early phase of chip development. This helped the customer to understand where they are placed with respect to the competition.


Conclusion

12 views0 comments

Recent Posts

See All

コメント


bottom of page