| This directory contains code uses to measure the performance of the Veyron IPC |
| stack. |
| |
| The ipc_test.go file uses GO's testing package to run benchmarks. Each |
| benchmark involves one server and one client. The server has two very simple |
| methods that echo the data received from the client back to the client. |
| |
| client ---- Echo(payload) ----> server |
| client <--- return payload ---- server |
| |
| There are two versions of the Echo method: |
| - Echo(payload []byte) ([]byte], error) |
| - EchoStream() <[]byte,[]byte> error |
| |
| The first benchmarks use the non-streaming version of Echo with a varying |
| payload size. The other benchmarks use the streaming version with varying |
| number of chunks and payload sizes. |
| |
| All these benchmarks create a VC before measurements begin. So, the VC creation |
| overhead is excluded. |
| |
| On a ThinkPad X1 Carbon (2 × Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz), we get: |
| |
| $ veyron go test -test.bench=. -test.cpu=1 \ |
| -test.benchtime=5s veyron/runtimes/google/ipc/benchmarks 2> /dev/null |
| PASS |
| Benchmark____1B 10000 545077 ns/op |
| Benchmark___10B 10000 587312 ns/op |
| Benchmark__100B 10000 523019 ns/op |
| Benchmark___1KB 10000 605235 ns/op |
| Benchmark__10KB 10000 957467 ns/op |
| Benchmark_100KB 5000 4101891 ns/op |
| Benchmark_N_RPCs____1_chunk_____1B 10000 554063 ns/op |
| Benchmark_N_RPCs____1_chunk____10B 10000 551424 ns/op |
| Benchmark_N_RPCs____1_chunk___100B 10000 538308 ns/op |
| Benchmark_N_RPCs____1_chunk____1KB 10000 585579 ns/op |
| Benchmark_N_RPCs____1_chunk___10KB 10000 904789 ns/op |
| Benchmark_N_RPCs___10_chunks___1KB 10000 1460984 ns/op |
| Benchmark_N_RPCs__100_chunks___1KB 1000 8491514 ns/op |
| Benchmark_N_RPCs_1000_chunks___1KB 100 105269359 ns/op |
| Benchmark_1_RPC_N_chunks_____1B 200000 763769 ns/op |
| Benchmark_1_RPC_N_chunks____10B 100000 583134 ns/op |
| Benchmark_1_RPC_N_chunks___100B 100000 80849 ns/op |
| Benchmark_1_RPC_N_chunks____1KB 100000 88820 ns/op |
| Benchmark_1_RPC_N_chunks___10KB 50000 361596 ns/op |
| Benchmark_1_RPC_N_chunks__100KB 5000 3127193 ns/op |
| ok veyron/runtimes/google/ipc/benchmarks 525.095s |
| |
| |
| The Benchmark_Simple_____1KB line shows that it takes an average of 0.605 ms to |
| execute a simple Echo RPC with a 1 KB payload. |
| |
| The Benchmark_N_RPCs____1_chunk____1KB line shows that a streaming RPC with the |
| same payload (i.e. 1 chunk of 1 KB) takes about 0.586 ms on average. |
| |
| And Benchmark_1_RPC_N_chunks____1KB shows that sending a stream of 1 KB chunks |
| takes an average of 0.088 ms per chunk. |
| |
| |
| Running the client and server as separate processes. |
| |
| In this case, we can see the cost of name resolution, creating the VC, etc. in |
| the first RPC. |
| |
| $ $VEYRON_ROOT/veyron/go/bin/bmserver --address=localhost:8888 --acl='{"...":"A"}' |
| |
| (In a different shell) |
| $ $VEYRON_ROOT/veyron/go/bin/bmclient --server=/localhost:8888 --count=10 \ |
| --payload_size=1000 |
| CallEcho 0 64133467 |
| CallEcho 1 766223 |
| CallEcho 2 703860 |
| CallEcho 3 697590 |
| CallEcho 4 601134 |
| CallEcho 5 601142 |
| CallEcho 6 624079 |
| CallEcho 7 644664 |
| CallEcho 8 605195 |
| CallEcho 9 637037 |
| |
| It took about 64 ms to execute the first RPC, and then 0.60-0.70 ms to execute |
| the next ones. |
| |
| |
| On a Raspberry Pi, everything is much slower. The same tests show the following |
| results: |
| |
| $ ./benchmarks.test -test.bench=. -test.cpu=1 -test.benchtime=5s 2>/dev/null |
| PASS |
| Benchmark____1B 500 21316148 ns/op |
| Benchmark___10B 500 23304638 ns/op |
| Benchmark__100B 500 21860446 ns/op |
| Benchmark___1KB 500 24000346 ns/op |
| Benchmark__10KB 200 37530575 ns/op |
| Benchmark_100KB 100 136243310 ns/op |
| Benchmark_N_RPCs____1_chunk_____1B 500 19957506 ns/op |
| Benchmark_N_RPCs____1_chunk____10B 500 22868392 ns/op |
| Benchmark_N_RPCs____1_chunk___100B 500 19635412 ns/op |
| Benchmark_N_RPCs____1_chunk____1KB 500 22572190 ns/op |
| Benchmark_N_RPCs____1_chunk___10KB 500 37570948 ns/op |
| Benchmark_N_RPCs___10_chunks___1KB 100 51670740 ns/op |
| Benchmark_N_RPCs__100_chunks___1KB 50 364938740 ns/op |
| Benchmark_N_RPCs_1000_chunks___1KB 2 3586374500 ns/op |
| Benchmark_1_RPC_N_chunks_____1B 10000 1034042 ns/op |
| Benchmark_1_RPC_N_chunks____10B 5000 1894875 ns/op |
| Benchmark_1_RPC_N_chunks___100B 5000 2857289 ns/op |
| Benchmark_1_RPC_N_chunks____1KB 5000 6465839 ns/op |
| Benchmark_1_RPC_N_chunks___10KB 100 80019430 ns/op |
| Benchmark_1_RPC_N_chunks__100KB Killed |
| |
| The simple 1 KB RPCs take an average of 24 ms. The streaming equivalent takes |
| about 22 ms, and streaming many 1 KB chunks takes about 6.5 ms per chunk. |
| |
| |
| $ ./bmserver --address=localhost:8888 --acl='{"...":"A"}' |
| |
| $ ./bmclient --server=/localhost:8888 --count=10 --payload_size=1000 |
| CallEcho 0 2573406000 |
| CallEcho 1 44669000 |
| CallEcho 2 54442000 |
| CallEcho 3 33934000 |
| CallEcho 4 47985000 |
| CallEcho 5 61324000 |
| CallEcho 6 51654000 |
| CallEcho 7 47043000 |
| CallEcho 8 44995000 |
| CallEcho 9 53166000 |
| |
| On the pi, the first RPC takes ~2.5 sec to execute. |
| |