Support for ARMv8 Cryptography Extensions on Vero4k

It didn’t seem to be too promising because I was seeing things like Configuring for linux-armv4 and the compiler directive -march=armv7-a but it compiled ok. However, the real test is in the running:

$ ./apps/openssl speed -evp aes-128-cbc
WARNING: can't open config file: /usr/local/ssl/openssl.cnf
Doing aes-128-cbc for 3s on 16 size blocks: 26868125 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 64 size blocks: 19475940 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 8870689 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2855763 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 389699 aes-128-cbc's in 3.00s
OpenSSL 1.0.2k  26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -march=armv7-a -Wa,--noexecstack -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     143775.92k   415486.72k   756965.46k   974767.10k  1064138.07k
$ ./apps/openssl speed aes-128-cbc
WARNING: can't open config file: /usr/local/ssl/openssl.cnf
Doing aes-128 cbc for 3s on 16 size blocks: 9696742 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 2655967 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 256 size blocks: 694189 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 175086 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 21947 aes-128 cbc's in 3.00s
OpenSSL 1.0.2k  26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -march=armv7-a -Wa,--noexecstack -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      51715.96k    56850.13k    59237.46k    59762.69k    59929.94k

With hardware extensions:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     143775.92k   415486.72k   756965.46k   974767.10k  1064138.07k

Without hardware extensions:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      51715.96k    56850.13k    59237.46k    59762.69k    59929.94k

Success! Give that man a cigar!

For completeness, here are the same tests, using the same newly-compiled code, on the Pi3.
With -evp flag:
aes-128-cbc 39351.31k 46509.40k 49894.91k 50424.78k 50552.04k
Without -evp flag:
aes-128 cbc 43417.83k 48102.90k 49957.93k 50531.98k 50629.88k

No appreciable difference between the two results, which is to be expected, and roughly in line with the Vero4k non-accelerated figures.

Edit: sha1 and sha256 should also be supported in hardware but I’m getting strange figures:

sha1 with -evp flag:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1              7235.88k    28153.60k    99893.85k   273392.64k   558115.50k

sha1 without -evp flag:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1              7307.21k    28169.39k    99848.70k   273652.05k   560201.29k

sha256 with -evp flag:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256            7189.42k    27786.75k    97804.03k   263942.49k   528731.95k

sha256 without -evp flag:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           37835.67k   124624.70k   311541.73k   492438.18k   596473.49k

No difference for sha1 and sha256 is actuall slower with the -evp flag. The sha256 with -evp is also performing the same as sha1, which is odd. It might be that the -evp flag disables hardware acceleration for sha256. Needs investigating.