All Projects → pedrorivera → SiaFpgaMiner

pedrorivera / SiaFpgaMiner

Licence: MIT license
VHDL FPGA design of an optimized Blake2b pipeline to mine Siacoin

Programming Languages

VHDL
269 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SiaFpgaMiner

NoDevFee
Claymore.s.Dual.Ethereum.Miner Developer 1%~2% charges redirected to your wallet.
Stars: ✭ 24 (-58.62%)
Mutual labels:  siacoin
bittube-miner
BitTube Miner
Stars: ✭ 39 (-32.76%)
Mutual labels:  mining
MM.Hash
Profit Switching Miner For HiveOS/Linux- OLD VERSION: Project Moved To SWARM! https://github.com/MaynardMiner/SWARM
Stars: ✭ 17 (-70.69%)
Mutual labels:  mining
captouch
👇 Add capacitive touch buttons to any FPGA!
Stars: ✭ 96 (+65.52%)
Mutual labels:  vhdl
leptin
🔗 Leptin is a PoW blockchain completely built in Nodejs.
Stars: ✭ 57 (-1.72%)
Mutual labels:  mining
BenEaterVHDL
VHDL project to run a simple 8-bit computer very similar to the one built by Ben Eater (see https://eater.net)
Stars: ✭ 30 (-48.28%)
Mutual labels:  vhdl
Learning
@elahe-dastan / @1995parham training and testing repository 📚 🤓
Stars: ✭ 53 (-8.62%)
Mutual labels:  vhdl
CoinHive
A nice friendly simple and easly customizable GUI for coinhives javascript miner to embed onto websites so users of your site can interact with features of the miner on every single page this javascript miner is to help those who have problems with advertisements/advertising/ads popups banners mobile redirects malvertising/malware etc and provid…
Stars: ✭ 58 (+0%)
Mutual labels:  mining
carrot-pool
Mining Pools Made Easy. ⛏ 📦 🏆
Stars: ✭ 53 (-8.62%)
Mutual labels:  mining
py-cryptonight
Python Cryptonight binding / extension. Monero hash function, proof-of-work, cn_slow_hash()
Stars: ✭ 20 (-65.52%)
Mutual labels:  mining
xeda
Cross EDA Abstraction and Automation
Stars: ✭ 25 (-56.9%)
Mutual labels:  vhdl
verilog-vcd-parser
A parser for Value Change Dump (VCD) files as specified in the IEEE System Verilog 1800-2012 standard.
Stars: ✭ 46 (-20.69%)
Mutual labels:  vhdl
p2pool
Decentralized pool for Monero mining
Stars: ✭ 635 (+994.83%)
Mutual labels:  mining
VHDLFormatter
VHDL formatter web online written in typescript
Stars: ✭ 44 (-24.14%)
Mutual labels:  vhdl
crypto
Is Bitcoin cloud mining profitable? Check the notebook to find out! (Not Clickbait)
Stars: ✭ 23 (-60.34%)
Mutual labels:  mining
argon2
Implementation of argon2 (i, d, id) algorithms with CPU dispatching
Stars: ✭ 73 (+25.86%)
Mutual labels:  blake2b
ROIcoin
ROI Coin "ROI Coin Is Different"
Stars: ✭ 16 (-72.41%)
Mutual labels:  mining
MVis-tokenminer
Token miner for 0xBitcoin
Stars: ✭ 18 (-68.97%)
Mutual labels:  mining
vhdl-hdmi-out
HDMI Out VHDL code for 7-series Xilinx FPGAs
Stars: ✭ 36 (-37.93%)
Mutual labels:  vhdl
Crypto-Webminer
Use Crypto Webminer JavaScript miner on various Cryptonight | CN-Lite | CN-Fast | CN-Fast2 | CN-Pico | CN-RWZ | CN-UPX2 | CN-Half | CN-Heavy | CN-Saber (BitTube) | Argon2id - Chukwa Stratum Pools
Stars: ✭ 166 (+186.21%)
Mutual labels:  mining

Build Status

SiaFpgaMiner

This project is a VHDL FPGA core that implements an optimized Blake2b pipeline to mine Siacoin.

Motivation

When CPU mining got crowded in the earlier years of cryptocurrencies, many started mining Bitcoin with FPGAs. The time arrived when it made sense to invest millions in ASIC development, which outperformed FPGAs by several orders of magnitude, kicking them out of the game. The complexity and cost of developing ASICs monopolized Bitcoin mining, leading to relatively dangerous mining centralization. Therefore, emerging altcoins decided to base their PoW puzzle on other algorithms that wouldn't give ASICs an unfair advantage (i.e. ASIC-resistant). The most popular mechanism has been designing the algorithm to be memory-hard (i.e. dependent on memory accesses), which makes memory bandwith the computing bottleneck. This gives GPUs an edge over ASICS, effectively democratizing access to mining hardware since GPUs are consumer electronics. Ethereum is a clear example of it with its Ethash PoW algorithm.

Siacoin is an example of a coin without a memory-hard PoW algorith and no ASIC miners some ASIC miners are being rolled out (see Obelisk and Antminer A3). So was a perfect candidate for FPGA mining! (more for fun than profit)

Design theory

To yield the highest posible hash rate, a fully unrolled pipeline was implemented with resources dedicated to every operation of every round of the Blake2b hash computation. It takes 96 clock cycles to fill the pipeline and start getting valid results (4 clocks per 'G' x 2 'G' per round x 12 rounds).

  • MixG.vhd implements the basic 'G' function in 4 steps. Eight and two-step variations were explored but four steps gave the best balance between resource usage and timing.
  • QuadG.vhd is just a wrapper that instantiates 4 MixG to process the full 16-word vectors and make the higher level files easier to understand.
  • Blake2bMinerCore.vhd instantiates the MixG components for all rounds and wires their inputs and outputs appropiately. Nonce generation and distribution logic also lives in this file.
  • /Example contains an example instantiation of Blake2bMinerCore interfacing a host via UART. It includes a very minimalist Python script to interface the FPGA to a Sia node for mining.

MixG

The diagram below shows the pipeline structure of a single MixG. Four of these are instantiated in parall to constitute QuadGs, which are chained in series to form rounds.

MixG logic

The gray A, B, C, D boxes contain combinatorial operations to add and rotate bits according to the G function specification. The white two-cell boxes represent two 64-bit pipelining registers to store results from the combinatorial logic that are used later on the process.

Nonce Generation and Distribution

Pipelining the hash vector throughout the chain implies heavy register usage and there is no way around it. Fortunately the X/Y message feeds aren't as resource-demanding because the work header can remain constant for a given block period, with the exception of the nonce field, which must obviously be changing all the time to yield unique hashes. Therefore, the nonce field must be tracked or kept in memory for when a given step in the mixing logic requires it. The most simplistic approach would be to make a huge N-bit wide shift register to "drag" the nonce corresponding to each clock cycle across the pipeline. This is not an ideal solution, for we would require N flip-flops (e.g. 48-bit counter) times the number of clock cycles it takes to cross the pipeline (48 x 96 = 4608 FF!)

Luckily, the nonce field is only used once per round (12 times total). This allows hooking up 12 counters statically to the X or Y input where the nonce part of the message is fed in each round. To make the counter output the value of the nonce corresponding to a given cycle, the counters' initial values are offset by the amount of clock cycles between them. The following diagram illustrates the point:

Nonce counters

In this case the offsets show that the nonce used in round zero will be consumed by round one 8 clock cycles after, by round two 20 cycles after, and so on. (The distance in clock cycles between counters is defined by the Blake2b message schedule)

Implementation results

It is evident that a single core is too big to fit in a regular affordable FPGA device. A ballpark estimate of the flip-flop resources a single core could use:

  • 64-bits per word x 16 word registers per MixG x 4 MixG per QuadG x 2 QuadG per round x 12 rounds = 98,308 registers (not considering nonce counters and other pieces of logic).

The design won't fit in your regular Spartan 6 dev board, which is why I built it for a Kintex 7 410 FPGA. Here are some of my compile tests:

Cores Clock Hashrate Mix steps Strategy Utilization Worst Setup Slack Worst Hold Slack Failures Notes
1 200 200 4 Default 18.00% 0.168 0
2 200 400 4 Default 38.00% 0
3 200 600 4 Default 56.00% -0.246 602 failing endpoints
3 200 600 4 Explore 56.00% -0.246 0.011 602 failing endpoints
3 166.67 500.01 4 Default 56.00% 0.132 0.02 0
4 166.67 666.68 4 Default 75.00% 0.051 0.009 0
5 166 830 4 Explore Placing error
4 173.33 693.32 4 Explore 75.00% 0.039 0 0
4 173.33 693.32 4 Explore 75.00% 0.17 0.022 0 1 BUFGs per core

As seen in the table, the highest number of cores I was able to instantiate was 4 and the highest clock flequency that met timing was 173.33 MHz.

~700 MH/s is no better than a mediocre GPU, but power draw is way less! (hey, I did say it was for fun)

Further work

  • Investigate BRAM as alternative to flip-flops (unlikely to fit the needs of this application).
  • Fine-tune a higher clock frequency to squeeze out a few more MH/s.
  • Porting to Blake-256 for Decred mining. That variant adds two rounds but words are half as wide, so fitting ~2x the number of cores sounds possible.
  • Do more in-depth tests with different number of steps in the G function (timing-resources tradeoff).
  • Play more with custom implementation strategies.

Resources

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].