In order to have a standard to which to compare the compiler-generated designs, a two manual crypto-engine implentations were done, using the TEA algorithm. One used a VHDL for-loop which was automatically unrolled by the Synopsis tools to generate straight-line code; the other was a hand-tweaked state machine that did one round of the algorithm per iteration.
In addition, software benchmarks were created for the three cryptographic algorithms under consideration. Software optimization techniques were studied, and loops were unrolled and memory accesses eliminated as much as possible.
It was recently brought to my attention that an efficient parallel implementation of DES is possible on general-purpose machines by testing multiple keys in parallel; each 32-bit machine register could represent thirty-two parallel copies of a single bit in the algorithm. This allows efficient representation of bit-level operations, but eliminates the use of lookup tables. The S-boxes of DES have to be represented by a logical expression to enable their implementation. The complexity of the algorithm decomposition necessary for this approach puts it outside the scope of this research; it would, however, enable the expression of DES in TIGER without source-language enhancements.