Abstract |
: |
Passive RFID tags are devices with very poor computational capability. How-
ever an increasing number of applications require authentication of the tag. For
this purpose, a simple solution is to use a challenge-response protocol. For exam-
ple, the reader can send a random challenge R and the tag responds H(R + S),
where S is a secret known by the reader and H is a hash function. Then, the
reader can check whether the tag knows S. SQUASH, introduced by Adi Shamir
in February 2008 [1], is a new hash function designed for this task. In this ar-
ticle, we describe an FPGA implementation of this algorithm minimizing the
resources.
SQUASH is based on the one-way function c = m2 mod n coming from the
Rabin cryptosystem [2]. To make it secure, the binary length of n must be at
least 1000 bits long [8]. In [1], the author suggests using a 64-bit non-linear
feedback shift register to generate m, a not yet factorized Mersenne's number
(2x - 1) as modulus n, and to send out the bits of c without storing them.
This process avoids storing three 1000-bit long numbers. The multiplications are
achieved by on-the-
y convolutions, sending each bit as soon as it is computed.
Consequently, the only needed memory aims at storing the carry of the previous
steps in the convolution. For the output, a window of 32 or 64 (or more) bits
in c is used. It yields a hash function with inputs of 64 bits that is scalable in
output.
In this paper, we propose implementations designed in order to minimize the
resources, possibly at the cost of an increased execution time. The target device
is a Xilinx Virtex-4 XC4VLX200-10 FPGA. The algorithm recommended by Adi
Shamir has 64 bits in input and 32 in output, and n = 21277 - 1. To reduce
the hardware cost, we minimized the number of registers in the implementation
data and control part. On the XC4VLX200, the design results require 377 slices.
The full execution time to produce 32 bits is 63,250 clock cycles at 222 MHz,
so we reach a throughput of 112,300 bits per second. We also implemented the
algorithm with other size numbers. For 128 bits in input and 64 in output, we
get 619 slices and 104,114 clock cycles at 206 MHz. In general, the length of the
output in
uences the execution time while the length of the input in
uences the
number of registers and slices. |