Professional Documents
Culture Documents
Autolykos Tuning
Autolykos Tuning
Autolykos Tuning
=======================================
This document provides some quick pointers on how to tune the
autolykos2 algo used by ERGO.
General background
==================
Autolykos2 is a memory-intensive low/medium power algo. However, with
the small memory accesses involved the algo behaves more like algos
like verthash rather than ethash. Performance is tied to the core clk,
and for max speed (especially for Vegas) core clk needs to be higher
than ethash to support driving the mem controller on the gpu(s).
This algo accesses mem in 32 byte chunks. This means that RDNA
generation gpus (Navi, Big Navi) will not perform well. Their 128 byte
cacheline size means that 128 bytes are read for every 32 byte
request, effectively halving the available memory bandwidth compared
to GCN (which uses 64 byte cachelines).
Polaris Tuning
==============
Polaris gpus are simple for autolykos2. We have not spent a lot of
time on tuning, so the examples below should be seen as a starting
point, there might be better combinations of core clk, mem clk, mem
straps to find.
- In our Polaris tests, a Nitro 470 4GB (Elpida), Nitro+ 570 8GB
(Samsung) and Nitro+ 580 8GB (Samsung) all displayed identical
hashrates for the same core clk as long as memory bandwidth was
sufficient. The 580 will rebuild the table slightly faster though,
hence produce a slightly better avg hashrate over time.
- Mem timings should be used, ethash timings of some sort are good
choices. Other timings for Equihash, Cuckoo or CN can produce good
results as well.
- Mem clock does not need to be high unless you're aiming for the
highest hashrates.
- The most efficient setups makes sure the soc clk stays at a lower
level, and maximizes the mem clk for that level, i.e. sets it to the
soc clk frequency.
- Your _effective_ core clk will decide your hashrate. Vegas are
notorious for not running at the configured frequency when AVFS
p-states are used.
1. Set ethash mem timings (see our ethash guide for examples).
2. Set core clk to 1225 MHz
3. Start with mem clk at 960 MHz (Vega 64) or 847 MHz (Vega 56).
4. Set voltage to 875mV.
5. Run the miner. Check the hashrate.
6. Increase core clk until you hit 165 MH/s. If you hit a bottleneck
where increased core clk doesn't boost the hashrate, increase mem
clk a little more. Repeat from 4.
7. If you crash, bump voltage a little more. Repeat from 4.
8. If you run stable for a while, lower voltage.
Note 1: Vega 56 Hynix can follow the same guide below, but ended up
slightly below 160 MH/s at 847 MHz soc/mem clk for us. You can then
switch up to 960 MHz soc clk level, following the Vega 64 guide below
instead. You can keep the mem clk lower than 960 MHz though, depending
on what hashrate you'd like to target.
Note 2: if none of the above doesn't make sense to you, the critical
piece of information here is that RX Vegas can't use a mem clk higher
than the current soc clk. However, a higher soc clk means a more power
hungry gpu, meaning we can't lower voltage as much as we'd like or the
gpu will crash. Finding the sweet spot soc clk level, and maximizing
the use of it by setting mem clk equal to soc clk is important when
optimizing for efficiency.
For Vega 56 with Samsung mem, if you have applied timings that can
reach 53-54 MH/s, then keep them.
Note: for Vega 56 Hynix, the guide below can still be followed, but
the target hashrate for us had to be lowered to 185 MH/s.
NOTE: if your gpu can't take the high mem clk values suggested
above, set it to the level you can mine ethash at.
- Mem timings are NOT important. VIIs will very much be bottlenecked
on core clock, and memory tuning does not need to be pushed.
- Mem clock can be significantly lowered to save power and keep the
HBM2 cool. Even at high hashrates, memory clk can usually be dropped
to around 750MHz.
- The limiting factor in hashrate will be core clk. This in turn will
be limited by the cooling of the card.
- The TRM 'VII Boost' enabled by the ethash C mode procedure described
above will increase hashrate by around 10% at the same core clock.
Navi GPUs
=========
As stated above, Navis simply won't do that well on autolykos2 due to
architectural changes that don't work well with the smaller mem
accesses. Therefore, we don't expect RDNA gpus to run this algo.
Example tunings:
Type GPU CUs CoreMHz SocMHz MemMHz TEdge TMem VDDC Power
5700XT 0 40 1100 1085 912 41C 70C 787 mV 84 W
5600XT 1 36 950 1266 910 40C 70C 800 mV 93 W
------------------------ GPU Status ---------------------------
GPU 0 [41C, fan 0%] autolykos2: 108.8Mh/s
GPU 1 [40C, fan 49%] autolykos2: 82.12Mh/s
Type GPU CUs CoreMHz SocMHz MemMHz TEdge TMem VDDC Power
RX6800 0 60 1075 685 1049 52C 76C 787 mV 116 W (voltage not tuned)
------------------------ GPU Status ---------------------------
GPU 0 [52C, fan 28%] autolykos2: 118.8Mh/s