Friday, April 13, 2012

IPSec Performance Benchmarking (Is the end of network "hardware" near)


Introduction


After over two weeks of debugging IPSec tunneling performance issues with variety of Cisco router and firewall platforms my team decided to try IPSec tunneling on Linux. There were some impressive results published online. Initial testing between two Dell Dual Xeon servers produced marginal results of 50Mbps. After some research we realized that in order to achieve high IPSec performance we need to use the Intel AES instruction set encryption optimization. Intel's AES instruction set provides a set of CPU instructions that implement AES encryption primitives in microcode with significantly higher performance than a kernel implementation. Support for the AES interaction set has been committed in newer Linux kernels – 2.6.32 or later. Intel has a good paper describing their test bed here.

Test Setup


For our testing we decided to use the latest 2.6.35.13 and StrongWAN IPSec tunnel. Since the hosts were CentOS 5.5 I ended up compiling the kernel from source. Make sure the AES-NI driver is checked in "make menuconfig". I also compiled the latest StrongSWAN 4.6.2 from source), without any special configuration options. ./configure && make; sudo make install

The test network was very simple:

The AES acceleration requires the "aesni_intel" driver, please make sure you run sudo /sbin/modprobe aesni_intel. 

Verify that the driver is loaded:

/sbin/lsmod | grep aes
aesni_intel            11278  9 
cryptd                  6545  4 aesni_intel

Once the driver is loaded, you can start the IPSec protocol stack:
sudo /usr/local/sbin/ipsec start and bring the tunnel up sudo /usr/local/sbin/ipsec up host-host

Below is the ipsec.conf for both hosts. Do not forget to configure ipsec.secrets as well (documentation is available on the StrongSWAN web site). The ESP encryption we tested was 128-bit AES-CCM-8 (as the original Intel paper) I expect that your mileage may vary, depending on whether specific cipher suite is supported by the AES driver or not. 

After initial negotiation the tunnel came up:

-bash-3.2$ sudo /usr/local/sbin/ipsec statusall
000 Status of IKEv1 pluto daemon (strongSwan 4.6.2):
000 interface lo/lo ::1:500
000 interface lo/lo 127.0.0.1:500
000 interface eth0/eth0 10.0.0.1:500
000 %myid = '%any'
000 loaded plugins: sha1 sha2 md5 aes des hmac gmp random kernel-netlink
000 debug options: raw+crypt+parsing+emitting+control+lifecycle+kernel+dns+natt+oppo+controlmore
000 
000 "host-host": 10.0.0.1[moon]…10.0.0.2[sun]; erouted; eroute owner: #2
000 "host-host":   ike_life: 3600s; ipsec_life: 1200s; rekey_margin: 180s; rekey_fuzz: 100%; keyingtries: 1
000 "host-host":   policy: PSK+ENCRYPT+TUNNEL+PFS; prio: 32,32; interface: eth0; 
000 "host-host":   newest ISAKMP SA: #1; newest IPsec SA: #2; 
000 "host-host":   IKE proposal: AES_CBC_128/HMAC_SHA1/MODP_2048
000 "host-host":   ESP proposal: AES_CCM_8_128/AUTH_NONE/<Phase1>

Note the ESP proposal:

For testing we tried several different methods – ssh, iperf and netcat. The netcat test setup is the easiest to replicate. To test unidirectional traffic flow, open two ssh sessions to each host:

On moon start:

Sesion1: nc -l 1234 > /dev/null
Sesion2: nc -l 1235 > /dev/null

On sun:

Session1: dd if=/dev/zero bs=1M count=10000 | nc moon 1234
Session2: dd if=/dev/zero bs=1M count=10000 | nc moon 1235

To test bidirectional traffic flow performance try:

On moon start:

Sesion1: nc -l 1234 > /dev/null
Session2: dd if=/dev/zero bs=1M count=10000 | nc sun 1234

On sun:

Session1: dd if=/dev/zero bs=1M count=10000 | nc moon 1234
Sesion2: nc -l 1234 > /dev/null

Host moon:

/usr/local/etc/ipsec.conf:

config setup
   plutodebug=control
   charonstart=no

conn %default
   ikelifetime=60m
   keylife=20m
   rekeymargin=3m
   keyingtries=1
   keyexchange=ikev1
   authby=secret
   esp=aes128ccm8-modp1024

conn host-host
   left=10.0.0.1
   leftid=@moon
   leftfirewall=yes
   right=10.0.0.2
   rightid=@sun
   auto=add

Host sun:

/usr/local/etc/ipsec.conf

config setup
   plutodebug=control
   charonstart=no

conn %default
   ikelifetime=60m
   keylife=20m
   rekeymargin=3m
   keyingtries=1
   keyexchange=ikev1
   authby=secret
   esp=aes128ccm8-modp1024

conn host-host
   left=10.0.0.2
   leftid=@sun
   leftfirewall=yes
   right=10.0.0.1
   rightid=@moon
   auto=add

Test results:

Single stream:
dd if=/dev/zero bs=1M count=10000 | nc moon 1234
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 134.506 seconds, 78.0 MB/s 
624Mbps unencrypted traffic

Two parallel streams (unidirectional):

8814329856 bytes (8.8 GB) copied, 288.863 seconds, 42.5 MB/s
8814329856 bytes (8.8 GB) copied, 288.863 seconds, 43.5 MB/s
688Mbps unencrypted traffic

Bidirectional traffic tests yielded similar results - each stream was able to push around 350-400Mbps for a total of 800Mbps unencrypted traffic. Including the IPSec overhead this pretty much maxes out the Gigabit Ethernet interface on the host, the throughput of the NIC maxes out at 900Mbps (encrypted traffic). The encryption seems to consume 1 2.4GHz core (99% utilization) for encryption/decryption of 800Mbps traffic (either uni- or bi-directional traffic). The rest of the cores were pretty ugh idle. We did not try anything to spread the load between multiple cores and get higher throughput because for our use case this is sufficient amount of bandwidth.

Why did I decide to write about this?

I think this is important for two reasons: one, it is amazing how much the performance of  "commodity" CPUs have increased, even for highly specialized tasks like AES encryption. It seems that Intel is going after a variety of market niches, which used to be served by a variety of manufacturers like Cavium. Two, a lot of solutions which even 3-4 years ago were possible only using dedicated "network hardware" - routers, firewalls, encryptors, etc. now are possible at much lower cost, greater flexibility and "no strings attached" on off-the-shelf Intel Xeon servers running Linux.  The Open Source software (like StrongSWAN) is becoming much more advanced and easier to administer, the availability of source code allows the more advanced users to modify and fix it as they see fit.For comparison a similarly sized Cisco ASA (ASA 5585-X with SSP10 , capable of 1Gbps encrypted traffic) runs $40k/each, with SmartNet support costs that add up over the years. Additionally, hardware replacement is not as easy, configuration requires specialized knowledge. At this point I don't see a lot of rational reasons why anyone would purchase a $40k firewall (add to that you need two per site + support contract), instead of just running StrongSWAN on two Dell PowerEdge servers. That is $150k savings, give or take – absolutely worth the day's worth of work to set it up and test it. I am sure that a lot of enterprises will reason why they should continue spending money on "dedicated network hardware" but they are very much the same arguments they had 10 years ago why they should keep buying underpowered and overpriced "commercial Unix" platforms like Data General, DEC, HP-UX and AIX. 10 years after resellers are GIVING AWAY HP-UX servers for the price of two-year support agreement.

Between the commoditization of Ethernet switching by Marvell, Broadcom, Fulcrum/Intel, advancement of Open Source software, OpenFlow/OVS, provisioning and orchestration like OpenStack, there is a rapidly growing set of alternatives to monolithic network "hardware". If I were Cisco I'd be worried about the switch and firewall sales.



Labels: , , , , ,

0 Comments:

Post a Comment

<< Home