Compact and high-speed hardware architectures and logic optimization methods for the AES algorithm Rijndael are described. Encryption and decryption data. look-up table logic or ROMs in the previous approaches, which requires a lot of hardware support. Reference  proposed the use of. Efficient Hardware Architecture of SEED S-box for . In order to optimize the inverse calculation, we . “A Compact Rijndael Hardware Architecture with. S- Box.
|Published (Last):||15 October 2013|
|PDF File Size:||3.53 Mb|
|ePub File Size:||7.68 Mb|
|Price:||Free* [*Free Regsitration Required]|
Conceived and designed the experiments: The S-box represents an important factor that affects the performance of AES on each of these factors. A number of techniques have been presented in the literature, which have attempted to improve the performance of the S-box byte-substitution. This paper proposes a new S-box architecture, defining it as ultra low power, robustly parallel and highly efficient in terms of area.
The architecture is discussed for both CMOS and FPGA platforms, and the pipelined architecture of the proposed S-box is presented for further time savings and higher throughput along with higher hardware resources utilization. A performance analysis and comparison of the proposed architecture is also conducted with those achieved by the existing techniques. The results of the comparison verify the outperformance of the proposed architecture in terms of power, delay and size.
Encryption algorithms are broadly classified as symmetric and asymmetric algorithms based on bxo type of keys used. Due to the complexity of asymmetric algorithms, symmetric ciphers are always preferred for their speed and simplicity.
AES the Rijndael algorithm is one such symmetric algorithm for encryption which replaced triple-DES and eventually became the number one choice for security algorithms all over the world by Since then, it has been used for countless different applications ranging in size hardwaer scale such as military, e-banking and different data communication purposes.
There have been many novel design techniques for AES that focus on obtaining high throughput or low area usage. The demand for fast and area-efficient AES implementations is rapidly growing and is becoming more crucial than the demand for a smaller active device on the chip. The traditional basic lookup table implementations are relatively fast and can achieve better performance with some modifications.
The use of smaller look up tables LUTs of different sizes ranging from 16 to bytes has become more reliable for getting higher speed.
This paper proposes the LUT of small size, which hardward the indexing and provides satisfactory results in terms of power, area and speed. A significant portion of the overall silicon area for implementing AES architectures is occupied by the S-box. The size of SubBytes is, in turn, determined by the number of S-boxes and their concrete implementation. Various implementation options for the AES S-box have been investigated in the hardwarf past [ 2 — 12 ].
One of our previous work [ 13 ], we show that the speed of the AES processor can be maximized by optimizing the S-box and MixColumn stages.
That work reports the high performance in terms of throughput and latency. In recent years, hardware implementations in CMOS technology received a lot of preference due to their good performance. An initial attempt of optimizing AES S-box is introducing the composite field decomposition technique of S-box, in rkjndael a multi-stage positive polarity Reed-Muller architecture has optimizwtion introduced [ 14 ]. In this S-box, the hazard-transparent XOR gates are located after the other gates which may block the hazards.
Moreover, there are a lot of applications coming out at wih, such as contactless smart card, wireless sensor network, small computing devices etc. This is the reason why a number of research architecturre have been proposed and further research works are still continuing focusing on low power [ 15 ]. The use of embedded functional blocks instead of general purpose logic elements is a good idea to reduce the dynamic power consumption of the designs [ 16 ].
It is seen that internal routing of embedded system block is more power efficient than the routing used for general purpose logic. Clock gating is another commonly used technique for dynamic power reduction [ 17 ].
A Novel Byte-Substitution Architecture for the AES Cryptosystem
Wong [ 18 ] aims to have achieved a high throughput compact AES S-box with minimal power consumption. They have proposed a novel pipelining arrangement over the compact composite field S-box such comlact both high throughput and low ooptimization are optimized. Some literatures provided good optkmization for FPGA implementations too. The optimized implementation on composite field arithmetic has introduced to reduce both static and dynamic power consumption of S-box along with pipelining and dynamic voltage scaling [ 19 ].
Besides, minimizing the supply voltage apparently reduces the power dissipation in designs. The T-box AES design is intended to have high throughput and low power usage [ 20 ].
The T-box method has its potential in embedded system to have power and energy efficient design since it relies on embedded RAM blocks rather than general purpose logic. Another technique is to use low data path width for AES design in order to reduce the power consumption [ 21 ]. Now-a-days there are a lot of applications coming in the market where an increasing number of battery-powered embedded systems like PDAs, cell phones, networked sensors, smart cards, RFID etc.
Eventually, this makes security a very important concern. Since these devices are resource constrained and battery powered, low power and small area are some of the primary requirements.
This paper focuses on the solution of this particular problem and has presented aechitecture novel technique in designing a low power, least delay and area efficient S-box for an AES comppact.
In the process of proving the claim, a fair comparison among area, delay and power estimation is presented architectkre on target delay. The graphical representation of i GE versus target value for critical path delay, ii Total Power versus target value for critical path delay and iii Power area product versus target value for critical path delay are performed which shows the novelty of the work.
The remainder of this paper is organized as follows. Furthermore, Section 5 presents the results and performance analysis of proposed S-box architecture followed by comparison to other recent related works hardwaer the Section 6.
A Compact Rijndael Hardware Architecture with S-Box Optimization. | BibSonomy
We conclude in Section 7. It is well known that the S-box is the most weighted transformation among the four rounds of the AES algorithm. The substitution byte S-box serves the purpose of bringing confusion to the data that is to be encrypted. The S-box is a 16 wifh 16 matrix box containing a total of byte hexadecimal and indexed in a row and column pattern.
The S-boxes used in the SubBytes function are created in such a way that they are invertible for using as inverse S-boxes in the InvSubBytes function. The S-box computation involves basically two oprimization, the multiplicative inverse and the affine transformation.
The multiplicative inverse is complex to perform in GF 2 8so in order to simplify, composite field arithmetic is used by some researchers. This sort of implementation has good response in terms of area, but due to the large signal activities, consumes more power. In order to achieve high throughput and low power, many literatures present the hardware rijnfael table implementation of S-box.
This paper presents an optimized look-up table implementation of S-box. Logically, the SubBytes transformation substitutes all of the 16 bytes of the state independently using the S-box. In software, the S-box is typically realized in the form of a look-up table since inversion in the Galios Field GF cannot be calculated efficiently on general-purpose processors.
In case of hardware, optomization the other hand, the implementation of the S-box is directed to the desired trade-off among area, delay, and power consumption. The most obvious implementation approach of S-box takes the form of hardware look-up tables. Our proposed design will explain how the hardware look-up table works efficiently optikization the next couple of sections.
More sophisticated approaches include the calculation of S-box function in hardware using its algebraic properties [ 22 ]. Composite field based design is a good example of calculating S-box. The main hardeare of composite field approach is greater power consumption, ootimization delay is much less compared to other architectures. He used an intermediate one-hot encoding of the input and arbitrary logic functions including cryptographic S-boxes to realize minimal power consumption.
Relatively large silicon area is the main drawback of this approach. Tiltech [ 24 ] describes a total of eight different implementations of the AES S-box in which he grouped them into three basic categories: Harsware the other hand, Implementations which calculate the S-box transformation in rijndxel were first proposed by Wolkerstorfer et al.
The former approach decomposes the elements of finite field into polynomials over the subfield and performs inversion there. Canright [ 27 ] improved the calculation of the S-box by switching the representation to a normal basis. The low-power approach of Bertoni et al. Due to the decoder-permute-encoder structure, there is only very little signal activity within the circuit when the input changes, resulting in low power consumption.
However, it may be necessary to add a large number of additional flip-flops when the pipeline stage wrchitecture placed between the decoder and encoder. It results large power consumption. This proposed algorithm substitutes a byte through small table look-up without inserting any flip flop when pipelined. Therefore, a change of a few input bits affects the evaluation of all output bits separately. As normally some output bits will remain unchanged, the signal activity within this particular path is low.
Thus it limits the overall power consumption of the S-box. The second implementation of Bertoni uses a two stages decoder structure so as to reduce the critical path delay of the circuit. This paper approaches a single stage decoder function which performs a compared to Bertoni. Elazm [ 28 ] shows a composite Galois Field design of S-box to reduce the size and the delay of the circuit. Transmission gate is employed to reduce power consumption of the mentioned circuit.
This design suffers long critical path delay due to switching and glitch. Therefore, less switching activities ensure lower power consumption. Due to simple Boolean implementations, the synthesizer has a much higher degree of optimiaation for optimizing the dompact circuit, which allows for a shorter critical path at a little expense of the silicon area. In a recent paper, Shanthini [ compct ] presents an optimized composite field arithmetic S-box implementation in a optimjzation stage pipeline.
Here the S-box operation is divided into the Galios Field multiplication and its inverse operation and later illustrated in a step-by-step manner. The main constrain is appeared when considered the critical path versus the area-power product.
Comparatively, the implementation of our proposed work on FPGA had a very good result in terms of area, power and product. He used polynomial basis using composite field arithmetic and got a fascinating result in both silicon area and power consumption.
But the main drawback is its critical path delay, which is five to six times than that of the proposed design. The next Section shows the proposed S-box architecture in detail. In the previous Section, the three general techniques for realizing the S-box has already been discussed, of which, the proposed architecture uses the combination of both the Hardware and the Software technique.
In one case the multiplicative inverse in GF 2 8 is realized as look-up table, while the affine transformation is computed as in hardware techniques [ 24 ]. This approach has the benefits of avoiding the complexity of inversion and reducing LUT space requirements to half that of the LUT used for the whole S-box. The basic idea of this approach is that the original S-box is broken down into a set of smaller size multiplexer-switched truth-table of say n-variable functions using the Shannon expression.
The mapping of LUTs is provided by the following pseudo code:. Initially, the single S-box is decomposed into 4 tables of 64 bytes, which are called as groups. The decomposition of these tables is similar to the group formation itself. Therefore, large number of iterations are initiated which results longer delay and larger power consumption.