Monday, October 30, 2017

Kafka with OpenSSL



UPDATE: This article has been updated on Nov 11th 2017 with the following details:

  • Updated to note that the issues linked in this article, related to WildFly OpenSSL, have all been fixed and the WildFly OpenSSL master branch now has the version which contains these fixes. The numbers that you see in this article, include those fixes.
  • Updated to add a section which lists the producer and consumer numbers when Java 9 runtime is used, for both the SSL engine shipped by JRE as well as the OpenSSL one. 
  • Updated to note that the version of Kafka used is 1.0.0 (which was released recently)


In one of the products I'm involved in, we use Kafka extensively. We have been using Kafka since 0.8.x days. If you follow the Kafka development, you might be aware that they are about to release their 1.0.0 version very soon. Kafka allows you to use SSL for both producing and consuming messages. Both the Kafka broker and the client libraries are configurable to specify the necessary SSL characteristics. Within our own product, we use Java client side libraries for consuming and producing messages. Their Java client side libraries have gone through a phase of API changes a while back in one of their releases. We use their "new" Java client APIs.

We started experimenting with using SSL for producing and consuming messages in Kafka, more than a year back. Our initial experiments showed that switching to SSL instead of using plaintext had a noticeable impact on performance. Given the way we use Kafka within our product, even some consistent (milli seconds) degradation in latency is almost noticeable. It's a acceptable and a known fact that you do incur certain performance impact when you are using SSL. However, the amount of degradation was to a point that we decided not to switch to SSL for a while. There have been discussions and JIRAs like this one where such impact has been tracked. Things have definitely improved since that JIRA (we are on 0.10.x release these days), but we didn't have enough time to get some numbers with SSL enabled within our environment.

A small detour

With that background, let me take a small detour from Kafka discussions. I also follow WildFly and various other projects in its ecosystem. Very recently, WildFly added support for using OpenSSL as a SSL provider instead of the one that's shipped as part of the JRE. As part of that support, they use WildFly OpenSSL project, which provides Java bindings (the implementation of interfaces necessary to use it as a SSLEngine in Java) for OpenSSL. Given that Java has a plugable mechanism for SSL providers and that fact that Kafka allows you to configure such SSL configurations, the WildFly OpenSSL project interested me.

Using Kafka with WildFly OpenSSL

I vaguely remember reading in some discussions that OpenSSL performs better compared to the SSL provider shipped in Java. So I decided to experiment with using WildFly OpenSSL with Kafka and compare it with the SSL provider shipped in Java. My goals of this experiment were pretty much these:
  • Use SSL for producing and consuming messages in Kafka
  • Compare OpenSSL against the SSL provider shipped in Java
  • Use the default settings that's shipped in Kafka for this performance testing. In fact, Kafka developers encourage you to use the defaults in performance testing as much as possible.
  • Use the tools shipped within Kafka for testing this performance. Kafka ships both producer and consumer performance testing tool, which is good enough for what we are after. Using their own tools, rules out any issues that I might end up with in the tool that I write for these tests (I did in fact write one of my own, just for the sake of it, but decided to stick with the ones shipped in Kafka since it pretty much ended up being similar both in terms of code and the output it produced)

I would like to note that it wasn't ever a goal for me, in these experiments to compare plain text and SSL numbers. This experiment is solely to see how different SSL providers are performing.

Setting up the system for the tests

 

Kafka installation

I decided to use the latest version of Kafka. The latest released version currently is 1.0.0 and I downloaded it from their downloads page. Kafka installation is straightforward, you just extract the downloaded archive and can straightaway boot it up and start consuming and producing messages. I won't go into any of the installation details of Kafka since that's out of the scope of this installation. I will go into the configurations that I used as we go along.

WildFly OpenSSL installation

I use MacOS for development and will be using this for my tests. In order to use WildFly OpenSSL, I went ahead and cloned the github repo. Ran into a build issue, but it was a straightforward fix for which there's now a pull request with a fix. I then setup Kafka to use WildFly OpenSSL and started experimenting. This set of experiments was just to make sure that it's usable without impacting any functionality. I did run into an issue which turned out to be an issue in WildFly OpenSSL which is now reported here. This issue has now been fixed and pushed to the WildFly OpenSSL upstream repo.

The README of that project already has the necessary instructions to build it, so I won't get into those details.

Java installation

I use the Oracle JRE 1.8 version for these tests:


java version "1.8.0_131"

Java(TM) SE Runtime Environment (build 1.8.0_131-b11)

Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

 

OpenSSL installation

In this experiments I'm going to use 1.1 version of OpenSSL. I installed it through homebrew and it's available at /usr/local/opt/openssl@1.1/bin/openssl. The exact version is:


/usr/local/opt/openssl@1.1/bin/openssl

OpenSSL> version

OpenSSL 1.1.0f  25 May 2017

Putting it all together


  • I built my WildFly OpenSSL libraries and copied over the java/target/wildfly-openssl-java-1.0.3.Final-SNAPSHOT.jar and macosx-x86_64/target/wildfly-openssl-macosx-x86_64-1.0.3.Final-SNAPSHOT.jar into the /libs/ folder. These are the 2 jars that contain the necessary WildFly OpenSSL support (depending on what OS you are on, you might need a different jar).
  • Given that I was going to use WildFly OpenSSL in multiple different tools which have their own different "main" classes, I decided to write a extremely basic Java agent which would just register WildFly OpenSSL as a provider. I then just pass -javaagent: as a JVM option to each of the tools/scripts I use for these tests. The code in my Java agent is pretty straight forward:

public class OpenSSLEnabler {

 public static void premain(final String agentArgs) throws Exception {
  try {
   org.wildfly.openssl.OpenSSLProvider.register();
  } catch(Exception e) {
   System.err.println("Failed to register WildFly OpenSSL provider");
   e.printStackTrace();
  }
 }
}


  • I setup the following as an environment variable to make sure this Java agent is picked up as well as OpenSSL 1.1 is used for these tests (when I enable OpenSSL as the provider):

    export KAFKA_OPTS="-javaagent:/opt/installations/kafka/1.0.0.RC4/kafka_2.12-1.0.0/libs/wildfly-openssl-javaagent-1.0.0-SNAPSHOT.jar -Dorg.wildfly.openssl.path=/usr/local/opt/openssl@1.1/lib/"

Note: The wildfly-openssl-javaagent-1.0.0-SNAPSHOT.jar is the jar containing the Java agent class that I explained above and the -Dorg.wildfly.openssl.path system property is to ensure that WildFly OpenSSL uses this specific OpenSSL installation (when I enable OpenSSL as the provider).

Performance tests details

The first round of testing will be using the SSL provider shipped in Java. We will use the kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh scripts that are shipped by Kafka itself (they are available in the bin directory of your Kafka installation).

The second round of testing will be using OpenSSL provider backed by WildFly OpenSSL. In these tests too we will use kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh scripts.

Kafka Broker and topics

In my test I'm going to create 3 topics, each with a replication factor of 1 and with partition count 1. The topics will be called kafka-ssl-perf-test-1k, kafka-ssl-perf-test-10k and kafka-ssl-perf-test-500k:

./kafka-topics.sh --create --topic kafka-ssl-perf-test-1k --partitions=1 --replication-factor=1 --zookeeper=localhost:2181

./kafka-topics.sh --create --topic kafka-ssl-perf-test-10k --partitions=1 --replication-factor=1 --zookeeper=localhost:2181

./kafka-topics.sh --create --topic kafka-ssl-perf-test-500k --partitions=1 --replication-factor=1 --zookeeper=localhost:2181

Producer tests

The producer will run with --record-size of 1024, 10240 and 512000 in 3 separate runs. Each run will generate --num-records 10000 with the respective size. Our usage of Kafka typically generates messages of lesser than 10K, so I decided to not stretch the tests for too large messages. In my test, I will use the kafka-ssl-perf-test-1k topic for 1024 sized messages, kafka-ssl-perf-test-10k for 10240 sized messages and kafka-ssl-perf-test-500k for 512000 sized messages.

Consumer tests

The consumer will be consuming (all) 10000 messages of each of these topics, in 3 separate runs. I used `--new-consumer` option for these tests since that's what we use in our application, through the Java client APIs.

Java default SSL run

As noted, by default, Kafka uses the SSL provider shipped in JRE. So it isn't necessary to configure the provider specifically. However, we will configure a few other configurations to enable SSL itself (by default Kafka uses plain text). So the broker configs that I added/changed are these, in $KAFKA_HOME/config/server.properties:

listeners=PLAINTEXT://localhost:9092,SSL://localhost:9093
ssl.keystore.location=/opt/kafka-experiments/ssl-certs/keystore.jks
ssl.keystore.password=password
ssl.key.password=password
ssl.truststore.location=/opt/kafka-experiments/trust-certs.jks
ssl.truststore.password=password
ssl.protocol=TLSv1.2

The rest of the configurations are unchanged and the ones default shipped by Kafka. As you see above, the main configurations are enabling SSL and using 9093 as the port for SSL communication and using TLSv1.2 as the SSL protocol.

Start Zookeeper and Kafka broker

cd /bin/
nohup ./zookeeper-server-start.sh ../config/zookeeper.properties &
nohup ./kafka-server-start.sh ../config/server.properties &

Run the producer perf test script

We'll pass the following producer configs (through a kafka-jre-ssl-producer.properties) to these runs:

bootstrap.servers=localhost:9093
security.protocol=SSL
ssl.protocol=TLSv1.2
ssl.truststore.location=/opt/kafka-experiments/trust-certs.jks
ssl.truststore.password=password

These configurations just enable SSL (and by default uses the JRE shipped SSL provider) with TLSv1.2 as the protocol on 9093 port.

1024 sized message

./kafka-producer-perf-test.sh --record-size 1024 --num-records 10000   --topic kafka-ssl-perf-test-1k --producer.config ./kafka-jre-ssl-producer.properties --throughput -1 > producer-jre-ssl-1k.txt

10240 sized message

./kafka-producer-perf-test.sh --record-size 10240 --num-records 10000   --topic kafka-ssl-perf-test-10k --producer.config ./kafka-jre-ssl-producer.properties --throughput -1 > producer-jre-ssl-10k.txt

512000 sized message

./kafka-producer-perf-test.sh --record-size 512000 --num-records 10000   --topic kafka-ssl-perf-test-500k --producer.config ./kafka-jre-ssl-producer.properties --throughput -1 > producer-jre-ssl-500k.txt

Note: I have the performance numbers in a table, later in this blog.

Run the consumer perf test script

We'll pass the following consumer configs (through a kafka-jre-ssl-consumer.properties) to these runs:

bootstrap.servers=localhost:9093
security.protocol=SSL
ssl.protocol=TLSv1.2
ssl.truststore.location=/opt/kafka-experiments/trust-certs.jks
ssl.truststore.password=password

Just like for the producer, these configurations enable SSL and use the default JRE provider with TLSv1.2. We will be consuming of 3 separate topics, each having messages of different sizes that we produced above. Each run uses a different and unique consumer group id.

1024 sized message

./kafka-consumer-perf-test.sh --topic kafka-ssl-perf-test-1k --new-consumer --messages 10000 --broker-list localhost:9093  --consumer.config ./kafka-jre-ssl-consumer.properties --group jre-ssl-1k > consumer-jre-ssl-1k.txt

10240 sized message

./kafka-consumer-perf-test.sh --topic kafka-ssl-perf-test-10k --new-consumer --messages 10000 --broker-list localhost:9093  --consumer.config ./kafka-jre-ssl-consumer.properties --group jre-ssl-10k > consumer-jre-ssl-10k.txt

512000 sized message

./kafka-consumer-perf-test.sh --topic kafka-ssl-perf-test-500k --new-consumer --messages 10000 --broker-list localhost:9093  --consumer.config ./kafka-jre-ssl-consumer.properties --group jre-ssl-500k > consumer-jre-ssl-500k.txt

Just like the producer numbers, I've noted these consumer numbers in a section later in this blog.

WildFly OpenSSL run

Now that we are done with the producer and consumer runs with default JRE SSL, we'll now reconfigure the Kafka broker to use OpenSSL as the provider. We won't be doing any other configuration changes to the broker configs and the producer, consumer configs we use for testing. To give this run the similar characteristics as that of our previous run, I deleted the Kafka and Zookeeper directories that store the Kafka topics. Essentially, this run is going to be from a clean slate. As noted previously, to enable WildFly OpenSSL, I configured the following environment property:

export KAFKA_OPTS="-javaagent:/opt/installations/kafka/1.0.0.RC4/kafka_2.12-1.0.0/libs/wildfly-openssl-javaagent-1.0.0-SNAPSHOT.jar -Dorg.wildfly.openssl.path=/usr/local/opt/openssl@1.1/lib/"

Here's what the relevant broker configs (in server.properties) look like now for OpenSSL:

listeners=PLAINTEXT://localhost:9092,SSL://localhost:9093
ssl.keystore.location=/opt/kafka-experiments/ssl-certs/keystore.jks
ssl.keystore.password=password
ssl.key.password=password
ssl.truststore.location=/opt/kafka-experiments/trust-certs.jks
ssl.truststore.password=password
ssl.provider=openssl
ssl.protocol=TLSv1.2

As you'll notice the only additional configuration here is the ssl.provider=openssl.

We then start zookeeper and the Kafka broker as previously. Remember, we (intentionally) deleted the Kafka directories that held the topics. So we will recreate the necessary topics as we did previously.

Run the producer perf test script

We'll pass the following producer configs (through a kafka-openssl-producer.properties) to these runs:

bootstrap.servers=localhost:9093
security.protocol=SSL
ssl.protocol=TLSv1.2
ssl.provider=openssl
ssl.truststore.location=/opt/kafka-experiments/trust-certs.jks
ssl.truststore.password=password

It's the same as what we used in our previous run, except that we use ssl.provider=openssl.

1024 sized message

./kafka-producer-perf-test.sh --record-size 1024 --num-records 10000   --topic kafka-ssl-perf-test-1k --producer.config ./kafka-openssl-producer.properties --throughput -1 > producer-openssl-1k.txt

Note: You'll see the following log message, which indicates that WildFly OpenSSL is rightly picked up, which then uses the natively installed 1.1.0f version of OpenSSL:

Oct 29, 2017 8:35:25 PM org.wildfly.openssl.SSL init
INFO: WFOPENSSL0002 OpenSSL Version OpenSSL 1.1.0f  25 May 2017

10240 sized message

./kafka-producer-perf-test.sh --record-size 10240 --num-records 10000   --topic kafka-ssl-perf-test-10k --producer.config ./kafka-openssl-producer.properties --throughput -1 > producer-openssl-10k.txt

512000 sized message

./kafka-producer-perf-test.sh --record-size 512000 --num-records 10000   --topic kafka-ssl-perf-test-500k --producer.config ./kafka-openssl-producer.properties --throughput -1 > producer-openssl-500k.txt

Run the consumer perf test script

We'll pass the following consumer configs (through a kafka-openssl-consumer.properties) to these runs:

bootstrap.servers=localhost:9093
security.protocol=SSL
ssl.protocol=TLSv1.2
ssl.provider=openssl
ssl.truststore.location=/opt/kafka-experiments/trust-certs.jks
ssl.truststore.password=password

It's the same as what we used for our consumer run with JRE SSL, except that we set the ssl.provider=openssl in this case.

1024 sized message

./kafka-consumer-perf-test.sh --topic kafka-ssl-perf-test-1k --new-consumer --messages 10000 --broker-list localhost:9093  --consumer.config ./kafka-openssl-consumer.properties --group open-ssl-1k > consumer-openssl-1k.txt

Just like the producer run with OpenSSL, you should see the following log message which confirms that WildFly OpenSSL was picked up for this run:

Oct 29, 2017 8:35:25 PM org.wildfly.openssl.SSL init
INFO: WFOPENSSL0002 OpenSSL Version OpenSSL 1.1.0f  25 May 2017

10240 sized message

./kafka-consumer-perf-test.sh --topic kafka-ssl-perf-test-10k --new-consumer --messages 10000 --broker-list localhost:9093  --consumer.config ./kafka-openssl-consumer.properties --group openssl-10k > consumer-openssl-10k.txt

512000 sized message

./kafka-consumer-perf-test.sh --topic kafka-ssl-perf-test-500k --new-consumer --messages 10000 --broker-list localhost:9093  --consumer.config ./kafka-openssl-consumer.properties --group openssl-500k > consumer-openssl-500k.txt

Final numbers (Java 8)

So let's now jump to the numbers, that we captured, from the above runs. The following is producer and consumer numbers for various message sizes that we tried above with JRE SSL and OpenSSL:

Producer Stats:


Producer Stats Message size 1024 Message size 10240 Message size 512000
JRE SSL OpenSSL JRE SSL OpenSSL JRE SSL OpenSSL
Records/sec 10857.76 14306.15 2232.64 2645.50 181.41 430.45
MB/sec 10.60 13.97 21.80 25.83 88.58 210.19
Avg. Latency (ms) 337.01 222.05 776.62 659.71 361.02 151.50
Max. Latency (ms) 568.00 387.0 1050.00 887.00 618.00 282.00
50th % latency (ms) 351 236 814 690 356 146
95th % latency (ms) 548 369 933 808 381 183
99th % latency (ms) 565 384 1016 870 522 235
99.9th % latency (ms) 568 387 1046 885 561 262


Consumer Stats:


Consumer Stats Message size 1024 Message size 10240 Message size 512000
JRE SSL OpenSSL JRE SSL OpenSSL JRE SSL OpenSSL
Data consumed MB 9.7656 9.7656 97.6563 97.6563 4882.8125 4882.8125
MB/sec 16.5239 24.5986 50.4423 97.2672 86.8797 250.4263
Total consumed messages 10000 10000 10000 10000 10000 10000
Num messages/sec 16920.4738 25188.9169 5165.2893 9960.1594 177.9296 512.8731
Rebalance time (ms) 29 17 31 16 29 17
Fetch time (ms) 562 380 1905 988 56173 19481
Fetch MB/sec 17.3766 25.6990 51.2631 98.8424 86.9245 250.6449
Fetch messages/sec 17793.5943 26315.7895 5249.3438 10121.4575 178.0215 513.3207


Summary (Java 8)

The above tables show that OpenSSL (backed by WildFly OpenSSL) out-performs the SSL provider shipped in the JRE, in both producer and consumer metrics recorded by the Kafka performance scripts. This by no means is a fine tuned performance testing or any kind of benchmark. The whole goal of this exercise was to see if it was worth the efforts to try and use OpenSSL (backed by WildFly OpenSSL) with Kafka. If the numbers/differences weren't as prominent as they are here, it wouldn't have been worth it. But as you see, the numbers show drastic improvements with WildFly OpenSSL and are promising enough to let us experiment more with OpenSSL.


Performance when using Java 9


I (and few other folks) were curious what kind of numbers we get when this same test was run with Java 9 as the runtime environment. Java 9 has some known performance improvements around SSL (like this), so I ran the entire set of tests (producer and consumer with both JRE shipped SSLEngine and WildFly OpenSSL) with Java 9 runtime. Just like for Java 8, I used the out-of-the-box settings for Kafka as well as Java 9 itself. The same set of instructions, noted previously in this article, were followed as for Java 8 to run these tests. The exact Java 9 version that was used is:


java version "9.0.1"

Java(TM) SE Runtime Environment (build 9.0.1+11)

Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)

Producer Stats (Java 9)

Producer Stats Message size 1024 Message size 10240 Message size 512000
JRE-9 SSL OpenSSL JRE-9 SSL OpenSSL JRE-9 SSL OpenSSL
Records/sec 11481.05 14265.33 2480.77 2760.14 437.34 494.07
MB/sec 11.21 13.93 24.23 26.95 213.55 241.25
Avg. Latency (ms) 341.45 227.38 702.48 636.20 148.82 132.03
Max. Latency (ms) 529.00 403.00 1009.00 854.00 691.00 269.00
50th % latency (ms) 358 236 709 634 138 122
95th % latency (ms) 513 389 885 775 198 193
99th % latency (ms) 526 401 987 834 310 228
99.9th % latency (ms) 529 402 1007 853 645 249


Consumer Stats (Java 9)

Consumer Stats Message size 1024 Message size 10240 Message size 512000
JRE-9 SSL OpenSSL JRE-9 SSL OpenSSL JRE-9 SSL OpenSSL
Data consumed MB 9.7656 9.7656 97.6563 97.6563 4882.8125 4882.8125
MB/sec 13.3593 24.1723 66.6141 89.5108 233.0253 247.7076
Total consumed messages 10000 10000 10000 10000 10000 10000
Num messages/sec 13679.8906 24752.4752 6821.2824 9165.9028 477.2358 507.3052
Rebalance time (ms) 30 18 31 18 28 17
Fetch time (ms) 701 386 1435 1073 20926 19695
Fetch MB/sec 13.9310 25.2995 68.0531 91.0123 233.3371 247.9214
Fetch messages/sec 14265.3352 25906.7358 6968.6411 9319.6645 477.8744 507.7431


Summary (Java 9)

In the above numbers you'll notice that:
  • Both for producer and consumer, there's a drastic improvement in the JRE shipped SSLEngine numbers, in almost all metrics, in Java 9 as compared to its counterpart in Java 8. It's especially prominent in messages with higher sizes.
  • There's not much difference in the numbers for WildFly OpenSSL, in Java 9, as compared to its Java 8 counterpart. In fact, the consumer performance numbers of WildFly OpenSSL in Java 9 have dropped slightly when compared to Java 8. The producer performance in Java 9 with WildFly OpenSSL have however improved slightly when compared to Java 8.
  • When the numbers of producer and consumer metrics of WildFly OpenSSL with Java 9 runtime are compared with the JRE shipped SSL engine in Java 9, WildFly OpenSSL still out-performs the one shipped in JRE.
All the configurations, the Java agent code and the output of the runs are available in my github repo here

3 comments:

Unknown said...

Thank you for posting this. Great article, very useful info. I was looking for these kind of performance data these days. We also see the performance issue when turn on SSL, we are using Kafka 0.11.0.0. I searched online, find couple of Kafka performance benchmark articles, they always state SSL performance usually has 30% decreasing compared with plaintext mode. But our Kafka SSL performance is only 20%-30% of plaintext performance. (We measured using 1k bytes msg size on 3 node cluster). Can you post the comparison for plaintext vs JRE SSL performance in your chart? I want to see whether it's our environment issue, or other factors causing our SSL performance issues.

thanks

Sophie

Unknown said...

Just some additional info, when I tested with kafka producer perf script sending event to EB plaintext port (EB is configured with SSL on and inter broker communication is also using SSL), I can get 240k EPS.

"10000000 records sent, 241435.090176 records/sec (230.25 MB/sec), 101.60 ms avg latency, 757.00 ms max latency, 77 ms 50th, 256 ms 95th, 337 ms 99th, 663 ms 99.9th."

When the script sending events to SSL port, I only get 40-50k EPS.

"5000000 records sent, 51666.236115 records/sec (49.27 MB/sec), 1168.51 ms avg latency, 4924.00 ms max latency, 512 ms 50th, 3656 ms 95th, 4302 ms 99th, 4632 ms 99.9th."

Jaikiran said...

>> I searched online, find couple of Kafka performance benchmark articles, they always state SSL performance usually has 30% decreasing compared with plaintext mode. But our Kafka SSL performance is only 20%-30% of plaintext performance. (We measured using 1k bytes msg size on 3 node cluster). Can you post the comparison for plaintext vs JRE SSL performance in your chart?

Sophie,

I don't have the exact numbers right now from our internal testing of SSL vs non-SSL performance. However, I do remember that it was around 25% degradation when we last tested it around an year back.