IOTA PoW Hardware Accelerator FPGA for Raspberry Pi (und USB)

DSC08600_DSC08601_DSC08602__enfuseUpdates:

2018-07-12: Demonstration Video online. Also, there is now a CCurl-compatible shared library for the USBDiver (e.g. for Light Wallet or NodeJS)
2018-07-01: Additional to the „Raspberry-Pi-Mode“ the PiDiver can now be used with USB – no Pi needed.  (video)
2018-06-25: Roman Semko’s Hercules got FPGA support (First confirmed Transaction , Roman’s Tweet)
2018-06-19: Core extended with CRC32 to recognize SPI transfer errors of Trytes for Mid-State calculation.
2018-06-06: Next prototypes will be assembled machinally. ESD-protection & EMI-filtering added. It should be compliant to CE when tested.
2018-06-04: Current measurement revealed, the PiDiver needs less than 2W power and can be powered by standard USB.
2018-06-01: Migrated Altera DE1 Proof-Of-Concept in another article in order to clean up obsolete informations. (You can find it here)
2018-05-31: Changed License to MIT and added board, schematic, gerber, BOM (with order-numbers), … files to Repository.
2018-05-27: Curl-P81 mid-state now calculated on FPGA; Mid-State takes about 7ms, PoW reaches about 15.8MH/s @ 188MHz. FPGA resource utilization ist about 98% …^^
2018-05-17: It’s alive *muahaha* It’s working fine and reaching about 14.6MH/s
2018-05-07: Some optimisations … Reaching 12.9MH/s on Altera DE1
2018-05-04: Installation instructions for Raspberry Pi added
2018-05-03: Major core optimizations – reaching now about 12MH/s.

Introduction

IOTA PoW needs a lot of computational power which makes sending transactions on smaller microcontrollers (like ARM) very slow. One of the main reasons is that the innerst loop of Curl-P81 can’t be computed very efficient on general purpose CPUs. Even modern CPUs with SIMD extension (like SSE or AVX) are heavily restricted when it comes to true parallel calculations.

This is a port of IOTA IRI’s Pearl-Diver for PoW-computation for FPGAs which speeds up the process of doing Proof-Of-Work significantly by a factor of more than 140 compared to e.g. a Raspberry Pi.

The core concept is that FPGAs are  able to calculate one round of Curl-P81 in a single clock cycle and one complete Hash in about 85 (including test for valid nonce). The core works 7-fold which means, in every 85 clock cycles 7 Hashes are calculated in parallel – this gives about 15.8MHash/s at a clock frequency of 188MHz. Moreover, the parallel computation can be adjusted easily to be even faster on larger FPGAs.

For instance, finding the nonce of a single transaction takes about 90s on a Raspi. Finding the nonce hardware accelerated by this core reduces the time to ~300ms.

This core can be also used by IRI when using a modified version which allows to use dcurl as external hashing libary.

So it is possible to build a full-node on raspberry pi with a decent hashing power for doing PoW calculations.

The project aims to be completly open source including all source codes, schematic, layouts.

VHDL-Core

The IOTA PoW Pearl-Diver core was implemented in VHDL.

Except an Altera PLL, no additional core or unusual VHDL library is used which makes it very simple to implement for other FPGA platform targets.

Moreover the core is customizable so the 7-fold parallelization can be increased or reduced (above 8, more cores should be instanciated because routing could fail or lead to very slow clock frequencies) depending on the resources of a FPGA target.

There is an additional (slower) 1-fold Curl which is used to calculate the mid-state.

The core implements a high-speed SPI interface which directly can be used by the hardware SPI of a Raspberry Pi.

Here the synthesis report for the Cyclone 10LP (10CL025):

Spectacle.J24487

Pearl-Diver Core Repository

The repository contains everything which is needed for rebuilding the PiDiver:

  • KiCad Board & Schematic Files
  • Gerber Files
  • Schematic as PDF
  • Component Placement as PDF with some notes
  • BOM with distributors and order numbers
  • VHDL code and project for (free) Quartus 17.x

Link to Github Repository

Hashing-Library dcurl with FPGA support

dcurl is a very fast Curl-Hashing-Library which not only supports graphics cards (OpenCL) but also provides highly optimized variants for SSE and AVX capable CPUs.

I did a fork of the library and added code for support of the VHDL Pearl-Diver.

The advantage is that every software working together with dcurl library can make use of the FPGA version of Curl (on Raspberry Pi – for different targets the low level control of SPI has to be replaced).

Link to Github Repository

Compiling and Testing with dcurl

1. Download and install BCM2835 library

# download the latest version of the library, say bcm2835-1.xx.tar.gz, then:
tar zxvf bcm2835-1.xx.tar.gz
cd bcm2835-1.xx
./configure
make
sudo make check
sudo make install

2. Enable „SPI“ under „Interfacing Options“.

sudo raspi-config

3. Load kernel module with modprobe

sudo modprobe spi_bcm2835

4. Check Permissions

pi@raspi:~ $ ls /dev/spidev0.0 -al
crw-rw---- 1 root spi 153, 0 May 3 15:17 /dev/spidev0.0

pi@raspi:~ $ groups
pi adm dialout cdrom sudo audio video plugdev games users input netdev gpio i2c spi

5. Clone and Compile dcurl library

git clone https://github.com/shufps/dcurl
cd dcurl
make BUILD_FPGA=1

6. Test library (for a reason I still don’t know SPI access only with „sudo“ possible).

cd build
sudo ./test-pow_fpga

Found nonce: 0004f6ee (mask: 00000008) 
Mid-State Time:   11ms 
PoW Time:        144ms  -  MH/s: 15.816

7. If there is no error then everything worked 🙂

Performance

Finding a valid nonce is completly random, so it can happen that finding needs up to multiple of seconds – but it also can happen that it takes almost no time at all.

Following a histogram for the PoW-time of 10,000 transactions.

In the x-Axis there is the time of PoW and in the y-axis the number of transactions which needed this time.

histogram

25% of all nonces are found within 87ms
50% of all nonces are found within 200ms
75% of all nonces are found within 433ms

It’s quite surprising that the PoW-time is not normally distributed (what I actually expected).

The PiDiver needed 50 minutes for 10,000 transactions – making about 3,33 PoW per second and giving an average of 300ms.

Support

If you like my work please consider to donate some MIOTAs to:

LLEYMHRKXWSPMGCMZFPKKTHSEMYJTNAZXSAYZGQUEXLXEEWPXUNWBFDWESOJVLHQHXOPQEYXGIRBYTLRWHMJAOSHUY

Discord: pmaxuw#8292

Licence

This project is licensed under the MIT-License