Exetools  

Go Back   Exetools > General > General Discussion

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 03-07-2026, 04:58
DARKER DARKER is offline
VIP
 
Join Date: Jul 2004
Location: Somewhere Over the Rainbow
Posts: 541
Rept. Given: 16
Rept. Rcvd 123 Times in 54 Posts
Thanks Given: 21
Thanks Rcvd at 1,038 Times in 262 Posts
DARKER Reputation: 100-199 DARKER Reputation: 100-199
Something wrong? 109.3% ETA: --
Reply With Quote
  #2  
Old 03-07-2026, 05:19
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Quote:
Originally Posted by DARKER View Post
Something wrong? 109.3% ETA: --
Hi Darker! We're past the median expected time (111% now), but that's completely normal with Pollard's Rho — it's a probabilistic algorithm. The median means there's a ~50% chance of finding the collision before that point, and ~50% after. At 111%, the cumulative probability of having found it is only about 62%, so there's still a ~38% chance of being exactly where we are.

Think of it like flipping a coin — just because you "should" get heads by flip #10 doesn't mean it can't take 15 or 20 flips. The ETA shows "--" because we're past the median estimate, but the math is solid and all 30 agents are grinding at 92 G/s. The collision can hit any moment now.

TL;DR: Perfectly normal statistical variance. We keep running.
Reply With Quote
The Following User Says Thank You to cjack For This Useful Post:
niculaita (03-07-2026)
  #3  
Old 03-07-2026, 14:50
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Status Update:

Progress: 156% of expected mean (85% cumulative probability)
Unique DPs collected: 1,093,619
Active agents: 29 (24× RTX 5090, 1× RTX 5070 Ti, 1× RTX 4070 Ti Super, 1× RTX 3090, 1× RTX 4070, 1× RTX 3060 Ti)
Fleet speed: 90.76 G/s
Efficiency: 78%
Collisions: 0 (still waiting)
Uptime: ~38 hours continuous
Why no collision yet?
Pollard's Rho is probabilistic — the "expected" iteration count is a median, not a guarantee. Being at 156% means we're in the statistical tail, but this is perfectly normal. The CDF is P(x) = 1 − e^(−π/4 · x²), so at x=1.56 there's still a ~15% chance of not having found it yet. Nothing is wrong — all agents are healthy and producing DPs at the correct rate.

ETA from current position:

90th percentile: ~7 hours
95th percentile: ~21 hours
99th percentile: ~52 hours


The 5090 fleet is available for another ~33 hours, which covers us up to the 95th percentile. Statistically, the collision is very likely to happen within the next day.

Stay tuned and join the battle!
Reply With Quote
  #4  
Old 03-08-2026, 03:55
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Critical Bug Found & Fixed (v1.5.0)

Hey all,

Wanted to share a hard lesson learned with the THUNDERSTRIKE distributed Pollard's Rho solver targeting Armadillo ECDSA-113 (binary Koblitz curve over GF(2^113)).

The Problem

We've been running ~30 agents (mostly RTX 5090s) at a combined ~92 G/s for a while now. Reached 233% of the expected median iteration count with absolutely zero collisions. The probability of that happening with a correctly functioning Rho walk is roughly 1.4% — suspicious enough to warrant a deep investigation.

Root Cause

The walk partition function p = X.hi & 31 was using the projective X coordinate instead of the affine x coordinate.

In Lopez-Dahab projective coordinates, X_proj = x_affine × Z. After the very first ld_madd step, Z diverges from 1. So two walks arriving at the same affine point but carrying different Z values would compute different partition indices, select different walk table entries, and diverge. Walks never merge. Pollard's Rho degenerates into pure random distinguished point sampling — you'd need ~10^12 DPs for a birthday collision among them. We had collected ~1.8 million. At that rate: roughly 223 years. Not ideal.

The bug was subtle because every individual component (EC arithmetic, DP detection, server collection) was working correctly in isolation. The partition function just happened to operate on the wrong representation of the point.

The Fix (v1.5.0)

We switched to per-step affine normalization using Itoh-Tsujii inversion, ensuring Z = 1 at every step. This means the partition function now sees the true affine x coordinate and walks sharing the same point will always take the same step — as Pollard intended.

With Z guaranteed to be 1 on input, we wrote an optimized ld_madd_z1 routine (5M+3S vs the previous 8M+5S). The compiled kernel hits 96 registers, 0 spills. Throughput on a single RTX 5090 is ~975 M/s — about 3.5x slower per step than before, but the algorithm now actually converges.

Verification

We wrote formal proofs for the partition invariant and ran a 5-test verification suite — all passing, confirming both that the old code was broken and that the new code preserves walk mergeability. Test runs show hash table duplicates growing at the expected rate, which is exactly what you want to see.

What's Next

Operations are temporarily suspended while we do final verification across the fleet. Once we restart with v1.5.0, the estimated time to solve one certificate is in the range of 32-50 hours with the full agent fleet at reduced per-step throughput. A very different story from "223 years."

Sometimes the most dangerous bugs are the ones where everything looks like it's working perfectly. 92 G/s of beautifully fast, completely useless computation.
Reply With Quote
  #5  
Old 03-08-2026, 06:07
aliali aliali is offline
Friend
 
Join Date: Jan 2002
Posts: 61
Rept. Given: 4
Rept. Rcvd 8 Times in 4 Posts
Thanks Given: 3
Thanks Rcvd at 15 Times in 8 Posts
aliali Reputation: 8
I can not connect to the server, waiting the new fix (v1.5.0) to be released with its source code.

Quote:
Originally Posted by cjack View Post
Critical Bug Found & Fixed (v1.5.0)

Hey all,

Wanted to share a hard lesson learned with the THUNDERSTRIKE distributed Pollard's Rho solver targeting Armadillo ECDSA-113 (binary Koblitz curve over GF(2^113)).

The Problem

We've been running ~30 agents (mostly RTX 5090s) at a combined ~92 G/s for a while now. Reached 233% of the expected median iteration count with absolutely zero collisions. The probability of that happening with a correctly functioning Rho walk is roughly 1.4% — suspicious enough to warrant a deep investigation.

Root Cause

The walk partition function p = X.hi & 31 was using the projective X coordinate instead of the affine x coordinate.

In Lopez-Dahab projective coordinates, X_proj = x_affine × Z. After the very first ld_madd step, Z diverges from 1. So two walks arriving at the same affine point but carrying different Z values would compute different partition indices, select different walk table entries, and diverge. Walks never merge. Pollard's Rho degenerates into pure random distinguished point sampling — you'd need ~10^12 DPs for a birthday collision among them. We had collected ~1.8 million. At that rate: roughly 223 years. Not ideal.

The bug was subtle because every individual component (EC arithmetic, DP detection, server collection) was working correctly in isolation. The partition function just happened to operate on the wrong representation of the point.

The Fix (v1.5.0)

We switched to per-step affine normalization using Itoh-Tsujii inversion, ensuring Z = 1 at every step. This means the partition function now sees the true affine x coordinate and walks sharing the same point will always take the same step — as Pollard intended.

With Z guaranteed to be 1 on input, we wrote an optimized ld_madd_z1 routine (5M+3S vs the previous 8M+5S). The compiled kernel hits 96 registers, 0 spills. Throughput on a single RTX 5090 is ~975 M/s — about 3.5x slower per step than before, but the algorithm now actually converges.

Verification

We wrote formal proofs for the partition invariant and ran a 5-test verification suite — all passing, confirming both that the old code was broken and that the new code preserves walk mergeability. Test runs show hash table duplicates growing at the expected rate, which is exactly what you want to see.

What's Next

Operations are temporarily suspended while we do final verification across the fleet. Once we restart with v1.5.0, the estimated time to solve one certificate is in the range of 32-50 hours with the full agent fleet at reduced per-step throughput. A very different story from "223 years."

Sometimes the most dangerous bugs are the ones where everything looks like it's working perfectly. 92 G/s of beautifully fast, completely useless computation.
Reply With Quote
  #6  
Old 03-08-2026, 07:00
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Hey aliali,

v1.5.0 is now live on the server — full source code included in the download package.

Download the new package from https://ecdlp.protect.cx/download/ArmadilloSolver.zip

Old agents (< v1.5.0) are automatically rejected by the server
The fleet is already running with 26 workers at ~21 G/s on the new target. ETA ~3 days.
We also switched to a new certificate target (codename ENDGAME). Your agent will pick up the new parameters automatically from the server when it connects — no manual config needed.
Reply With Quote
The Following User Says Thank You to cjack For This Useful Post:
wx69wx2023 (03-10-2026)
  #7  
Old 03-08-2026, 10:05
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
@cjack

There is a small bug for printf. The console output is mixed with printf from if and else branches:

[ENDGAME][ 315s] 129.85 M iter/s | 4.090e+10 iters | DP sent:1 NE - retrying...

There should be a '\n' for if(hb_failed) branch.

And I don't know why the heartbeat failed so frequently, my network connection is quite stable.
Code:
                if (hb_failed)
                    printf("\r[%s][%7.0fs] %.2f %s | %.3e iters | SERVER OFFLINE - retrying...   ",
                           job_codename ? job_codename : "?",
                           elapsed, dspeed, unit, (double)agent_iters);
                else
                    printf("\r[%s][%7.0fs] %.2f %s | %.3e iters | DP sent:%u   ",
                           job_codename ? job_codename : "?",
                           elapsed, dspeed, unit, (double)agent_iters, dp_found);
Quote:
Waiting for server at ecdlp.protect.cx ... Registered as worker: b2b7f6fc
Project: ENDGAME
Project ID: 97c327d7
G.x: 0x02909A5FDD46C946F29ED931C083F
G.y: 0x0167549B3D78A6930526E91FF0E8C
G on curve: YES
Q.x: 0x138EAD61AE6D9E60A6515D34FC371
Q.y: 0x004D1DB747FC9B632A25C2D12E515
Q on curve: YES
DP bits: 25

Downloading walk table from server...
Walk table loaded (2048 bytes)
Precomputing 65536 subset sums per half...
Precomputation done: 2 x 65536 subset sums
Initializing 47104 worms with unique starting points...
session_seed=0x0000CB8869ACD4E1 h_salt=0x92598B7E
worm[0]: h=0x0FA019BA maskA=6586 maskB=4000 x=0001E94EA3FD44712959B08F0FC663E5
worm[1]: h=0xBECEF163 maskA=61795 maskB=48846 x=000095148466920574C43576AE63578B
worm[2]: h=0x4683B07B maskA=45179 maskB=18051 x=00002F0E9F90C29DED5E5DC8F04583ED
worm[3]: h=0xAEA8D40A maskA=54282 maskB=44712 x=0001E64EA0141008A6655EAE5D3061C5
worm[4]: h=0x1E6D8088 maskA=32904 maskB=7789 x=00003562A9560C293490CBD417D52FA7
Worm init complete.
Session seed: 0x0000CB8869ACD4E1

Starting distributed Pollard's Rho...

First launch: 1 DPs found
DP[0] verify: OK
[ENDGAME][ 48s] 142.69 M iter/s | 6.849e+09 iters | DP sent:1
[DP] 200 sent, 200 unique (0.0% dup rate)
[ENDGAME][ 93s] 140.04 M iter/s | 1.302e+10 iters | DP sent:4
[DP] 403 sent, 403 unique (0.0% dup rate)
[ENDGAME][ 146s] 141.40 M iter/s | 2.064e+10 iters | DP sent:2
[DP] 603 sent, 603 unique (0.0% dup rate)
[ENDGAME][ 193s] 140.45 M iter/s | 2.711e+10 iters | DP sent:5
[DP] 801 sent, 801 unique (0.0% dup rate)
[ENDGAME][ 217s] 140.92 M iter/s | 3.058e+10 iters | DP sent:2
[DP] 895 sent, 894 unique (0.1% dup rate)
[ENDGAME][ 243s] 140.54 M iter/s | 3.415e+10 iters | DP sent:3
[DP] 1003 sent, 1002 unique (0.1% dup rate)
[ENDGAME][ 315s] 129.85 M iter/s | 4.090e+10 iters | DP sent:1 NE - retrying...
[DP] 1200 sent, 1199 unique (0.1% dup rate)
[ENDGAME][ 364s] 131.19 M iter/s | 4.775e+10 iters | DP sent:3
[DP] 1400 sent, 1399 unique (0.1% dup rate)
[ENDGAME][ 414s] 132.12 M iter/s | 5.470e+10 iters | DP sent:2
[DP] 1600 sent, 1599 unique (0.1% dup rate)
[ENDGAME][ 459s] 132.83 M iter/s | 6.097e+10 iters | DP sent:4
[DP] 1800 sent, 1799 unique (0.1% dup rate)
[ENDGAME][ 501s] 133.44 M iter/s | 6.685e+10 iters | DP sent:4
[DP] 2002 sent, 2001 unique (0.0% dup rate)
[ENDGAME][ 548s] 133.97 M iter/s | 7.341e+10 iters | DP sent:1
[DP] 2203 sent, 2202 unique (0.0% dup rate)
[ENDGAME][ 618s] 129.72 M iter/s | 8.017e+10 iters | DP sent:4 NE - retrying...
[DP] 2401 sent, 2400 unique (0.0% dup rate)
[ENDGAME][ 662s] 130.57 M iter/s | 8.644e+10 iters | DP sent:4
[DP] 2600 sent, 2599 unique (0.0% dup rate)
[ENDGAME][ 706s] 131.31 M iter/s | 9.271e+10 iters | DP sent:1
[DP] 2800 sent, 2799 unique (0.0% dup rate)
[ENDGAME][ 760s] 132.26 M iter/s | 1.005e+11 iters | DP sent:3
[DP] 3001 sent, 3000 unique (0.0% dup rate)
[ENDGAME][ 807s] 132.81 M iter/s | 1.072e+11 iters | DP sent:3
[DP] 3201 sent, 3200 unique (0.0% dup rate)
[ENDGAME][ 852s] 133.15 M iter/s | 1.134e+11 iters | DP sent:1
[DP] 3401 sent, 3400 unique (0.0% dup rate)
[ENDGAME][ 921s] 130.51 M iter/s | 1.202e+11 iters | DP sent:0 NE - retrying...
[DP] 3601 sent, 3600 unique (0.0% dup rate)
[ENDGAME][ 973s] 131.07 M iter/s | 1.275e+11 iters | DP sent:2
[DP] 3801 sent, 3800 unique (0.0% dup rate)
[ENDGAME][ 1024s] 131.61 M iter/s | 1.348e+11 iters | DP sent:1
[DP] 4003 sent, 4002 unique (0.0% dup rate)
[ENDGAME][ 1072s] 132.19 M iter/s | 1.417e+11 iters | DP sent:2
[DP] 4200 sent, 4199 unique (0.0% dup rate)
[ENDGAME][ 1116s] 132.60 M iter/s | 1.480e+11 iters | DP sent:5
[DP] 4402 sent, 4401 unique (0.0% dup rate)
[ENDGAME][ 1162s] 133.00 M iter/s | 1.545e+11 iters | DP sent:5
[DP] 4600 sent, 4599 unique (0.0% dup rate)
[ENDGAME][ 1233s] 130.74 M iter/s | 1.612e+11 iters | DP sent:3 NE - retrying...
[DP] 4800 sent, 4799 unique (0.0% dup rate)
[ENDGAME][ 1291s] 130.92 M iter/s | 1.690e+11 iters | DP sent:5
[DP] 5000 sent, 4999 unique (0.0% dup rate)
[ENDGAME][ 1345s] 130.97 M iter/s | 1.762e+11 iters | DP sent:2
[DP] 5202 sent, 5201 unique (0.0% dup rate)
[ENDGAME][ 1391s] 131.08 M iter/s | 1.823e+11 iters | DP sent:4
[DP] 5400 sent, 5399 unique (0.0% dup rate)
[ENDGAME][ 1440s] 131.17 M iter/s | 1.889e+11 iters | DP sent:4
[DP] 5601 sent, 5600 unique (0.0% dup rate)
[ENDGAME][ 1486s] 131.20 M iter/s | 1.950e+11 iters | DP sent:2
[DP] 5803 sent, 5802 unique (0.0% dup rate)
[ENDGAME][ 1560s] 128.75 M iter/s | 2.008e+11 iters | DP sent:3 NE - retrying...
[DP] 6001 sent, 6000 unique (0.0% dup rate)
[ENDGAME][ 1609s] 128.85 M iter/s | 2.073e+11 iters | DP sent:3
[DP] 6202 sent, 6201 unique (0.0% dup rate)
[ENDGAME][ 1655s] 128.94 M iter/s | 2.134e+11 iters | DP sent:2
[DP] 6401 sent, 6400 unique (0.0% dup rate)
[ENDGAME][ 1707s] 128.96 M iter/s | 2.201e+11 iters | DP sent:6
[DP] 6602 sent, 6601 unique (0.0% dup rate)
[ENDGAME][ 1755s] 129.12 M iter/s | 2.266e+11 iters | DP sent:4
[DP] 6800 sent, 6799 unique (0.0% dup rate)
[ENDGAME][ 1836s] 127.47 M iter/s | 2.340e+11 iters | DP sent:3
[DP] 7000 sent, 6999 unique (0.0% dup rate)
[ENDGAME][ 1890s] 127.60 M iter/s | 2.412e+11 iters | DP sent:1
[DP] 7201 sent, 7200 unique (0.0% dup rate)
[ENDGAME][ 1940s] 127.70 M iter/s | 2.477e+11 iters | DP sent:0
[DP] 7404 sent, 7403 unique (0.0% dup rate)
[ENDGAME][ 1990s] 127.79 M iter/s | 2.543e+11 iters | DP sent:5
[DP] 7604 sent, 7603 unique (0.0% dup rate)
[ENDGAME][ 2039s] 127.88 M iter/s | 2.608e+11 iters | DP sent:2
[DP] 7801 sent, 7800 unique (0.0% dup rate)
[ENDGAME][ 2090s] 127.90 M iter/s | 2.673e+11 iters | DP sent:3
[DP] 8001 sent, 8000 unique (0.0% dup rate)
[ENDGAME][ 2164s] 127.01 M iter/s | 2.748e+11 iters | DP sent:4
[DP] 8200 sent, 8199 unique (0.0% dup rate)
[ENDGAME][ 2219s] 126.99 M iter/s | 2.818e+11 iters | DP sent:6
[DP] 8401 sent, 8400 unique (0.0% dup rate)
[ENDGAME][ 2249s] 126.71 M iter/s | 2.850e+11 iters | DP sent:3
__________________
AKA Solomon/blowfish.

Last edited by WhoCares; 03-08-2026 at 11:55.
Reply With Quote
The Following User Gave Reputation+1 to WhoCares For This Useful Post:
Jupiter (03-11-2026)
  #8  
Old 03-08-2026, 14:04
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Hi WhoCares!
Thank you so much for the detailed bug report! Both issues were spot-on.

v1.5.1 is now available from the dashboard download link with the following fixes:

1) Printf mixing (agent): The hb_failed status line was using \r without a terminating \n, causing it to overwrite normal output. Fixed — now uses \n delimiters so the "SERVER OFFLINE" message prints cleanly on its own line.

2) Heartbeat timeouts every ~300s (server): This was the more critical one. save_state() was holding the global lock during the entire disk write — serializing millions of DPs with struct.pack in a loop while every API endpoint waited. As the DP table grows, save time grows, and at ~15M+ DPs it was blocking long enough to trigger agent heartbeat timeouts.

Fix: new save_state_background() takes a fast snapshot of all data structures under lock (milliseconds), then releases the lock and writes to disk outside it. Agents no longer see any interruption during auto-save.

Fleet is currently at 28 workers / ~21 G/s, 9.3% progress, efficiency 99.9%. No more periodic disconnections.

Thanks again for catching these — the heartbeat timeout one in particular would have become worse as the DP table keeps growing toward collision.
Reply With Quote
The Following User Says Thank You to cjack For This Useful Post:
Jupiter (03-11-2026)
  #9  
Old 03-11-2026, 10:00
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
@cjack

server is unstable now. Calc speed is very slow.
__________________
AKA Solomon/blowfish.
Reply With Quote
  #10  
Old 03-11-2026, 20:07
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Hi WhoCares! Thanks for the report and sorry about the disruption at 3 AM.

Here's what happened: we switched the attack target from Cert #11 to Cert #6 (the correct eval certificate — we discovered the old target had a wrong base point). The server was restarted with a fresh project, and then we upgraded the entire fleet to agent v1.6.0.

Sorry for the inconvenience — this is very much a work-in-progress experiment and things like these can happen. We really appreciate everyone who's contributing despite the bumps along the road. Your help means a lot!

Everything is stable now — 39 workers running at 34+ G/s, server healthy with 14.5M+ unique DPs and growing.

Important: Please download the latest agent (v1.6.0) from the dashboard at ecdlp.protect.cx. This version has the async DP sender pipeline which eliminates idle time between GPU kernel launches — you should see a nice speed boost (up to 25% faster). Your current v1.5.1 still works but it's leaving performance on the table.

Thanks for contributing to the attack!
Reply With Quote
  #11  
Old 03-12-2026, 09:06
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
@cjack

Performance optimizations suggested by the AI agent (Claude Opus 4.6 Thinking):

Core conclusion: The two highest-priority optimizations together can provide roughly a 3–4× speedup:

Montgomery batch inversion (P0) — A prototype already exists in pollard_rho.cuh, but it uses an old data structure. It needs to be ported to the fe_t architecture used in solver_fast.cu. This can reduce the per-step cost from 15M + 120S to approximately 6M + 6S.

CUDA Stream double buffering + pinned memory (P0) — The current workflow (kernel → sync → D2H → CPU processing) is strictly serial. Using dual streams with a ping-pong buffer allows GPU computation to fully overlap with CPU/network processing. ������

And I ask AI to code a Python script to run and upgrade "slover_fast.exe" automatically:
https://github.com/z16166/PySolverLauncher/
Attached Files
File Type: rar performance_analysis_en.rar (6.7 KB, 6 views)
__________________
AKA Solomon/blowfish.

Last edited by WhoCares; 03-12-2026 at 15:03.
Reply With Quote
  #12  
Old 03-12-2026, 16:13
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Quote:
Originally Posted by WhoCares View Post
@cjack

Performance optimizations suggested by the AI agent (Claude Opus 4.6 Thinking):

Core conclusion: The two highest-priority optimizations together can provide roughly a 3–4× speedup:

Montgomery batch inversion (P0) — A prototype already exists in pollard_rho.cuh, but it uses an old data structure. It needs to be ported to the fe_t architecture used in solver_fast.cu. This can reduce the per-step cost from 15M + 120S to approximately 6M + 6S.

CUDA Stream double buffering + pinned memory (P0) — The current workflow (kernel → sync → D2H → CPU processing) is strictly serial. Using dual streams with a ping-pong buffer allows GPU computation to fully overlap with CPU/network processing. ������

And I ask AI to code a Python script to run and upgrade "slover_fast.exe" automatically:
https://github.com/z16166/PySolverLauncher/
@WhoCares

Thanks for the deep analysis and the PySolverLauncher — really appreciate you putting time into this!

Let me give some context on the current state, since the AI reviewed v1.3.0 but we're now on v1.6.0 with several things already addressed:

CUDA Streams / GPU overlap — Already solved in v1.6.0. We implemented an async DP sender pipeline (background thread handles all HTTP while the main thread immediately relaunches the kernel). Measured GPU utilization: 100%, power draw 502W/575W on the RTX 5090. The double-buffering approach from the analysis would add <0.5% on top of what we already have.

Montgomery batch inversion — This is the one genuinely interesting suggestion. The per-step Itoh-Tsujii inversion IS the main cost (8M+116S out of 15M+120S per step). Batch inversion could amortize it across 128 threads. However, it requires 14x __syncthreads() per step (currently we have ZERO sync across 2048 steps), plus shared memory for the product tree, plus extra registers. Our realistic estimate is 1.5-2x speedup, not 3x. Worth exploring after the current run.

Comb w=5/6 for fe_mul — We actually tested this. Wider comb = more registers = less occupancy. Our history: comb table fe_mul gave 198 registers (16.7% occupancy), while the current table-free approach uses 80 registers (50% occupancy) and was 2.4x faster in practice. Occupancy wins over per-operation speed on GPU.

Important note about per-step normalization: the analysis calls it "the biggest bottleneck" — true, but it's mathematically required. Without it, walks never merge (we learned this the hard way — our old 3.5 G/s benchmark was invalid because of this). It can only be amortized (batch inversion), not removed.

Your PySolverLauncher: we've bundled it into the official ArmadilloSolver.zip on the dashboard! The /api/download-info endpoint was already there, so it works out of the box. Credit in the changelog. Thanks for the contribution!

Current status: 42 G/s fleet, 101M DPs, 22% probability, all verified end-to-end. Just waiting for the birthday paradox to do its thing!
Reply With Quote
  #13  
Old 03-12-2026, 16:52
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
@cjack

The AI's analysis is based on the 1.6.0 code. It's just that each time I extract the latest version of the exe and the source code into the same directory whose name contains 1.3.0, overwriting the old version.

You forgot to update the zip sha1 from download api interface, latest is B5A021ADE2C88548EB511120C17470F0D00FBB5C, not 0306102b37f2a102d4d8376c3dc806ce059c9597.
This sha1 is used as version number by Python script. The real version number "v1.6.0" is hardcoded in exe. It's not easy to extract it.
__________________
AKA Solomon/blowfish.

Last edited by WhoCares; 03-12-2026 at 17:03.
Reply With Quote
  #14  
Old 03-12-2026, 17:36
cjack's Avatar
cjack cjack is offline
Family
 
Join Date: Jan 2002
Posts: 170
Rept. Given: 196
Rept. Rcvd 176 Times in 34 Posts
Thanks Given: 332
Thanks Rcvd at 219 Times in 64 Posts
cjack Reputation: 100-199 cjack Reputation: 100-199
Quote:
Originally Posted by WhoCares View Post
@cjack

The AI's analysis is based on the 1.6.0 code. It's just that each time I extract the latest version of the exe and the source code into the same directory whose name contains 1.3.0, overwriting the old version.

You forgot to update the zip sha1 from download api interface, latest is B5A021ADE2C88548EB511120C17470F0D00FBB5C, not 0306102b37f2a102d4d8376c3dc806ce059c9597.
This sha1 is used as version number by Python script. The real version number "v1.6.0" is hardcoded in exe. It's not easy to extract it.
@WhoCares

Good catch on the stale SHA1! It was cached at server startup and never refreshed after we updated the ZIP (we added your launcher script + updated changelog).

Fixed. The API now returns the correct hash. I also implemented automatic mtime-based detection: whenever the ZIP file changes on disk, /api/download-info recomputes SHA1 and size on the fly. No more stale hashes, no server restart needed.

The API response now also includes a version field:

{
"available": true,
"sha1": "c7c267adca36c7e2ddac0f4b3bf37100f88ef033",
"size": 1018086,
"filename": "ArmadilloSolver.zip",
"version": "1.6.0"
}

This is fully backward compatible — your existing launcher works without any changes. The version field is just extra info you can optionally use if you want to display a human-readable version instead of the SHA1.

We also added VERSION.txt and a README.txt with quick start instructions inside the ZIP, so anyone extracting it knows exactly which version they have.

Thanks for keeping an eye on things — your feedback makes the project better!

Flash status update:

42 active workers, fleet speed 41.7 G/s
104M distinguished points collected
Current probability: 23.4%
Median ETA: ~17 hours (Mar 13, ~03:00 UTC+1)
0 collisions so far — the hunt continues!
Reply With Quote
The Following User Says Thank You to cjack For This Useful Post:
WhoCares (03-13-2026)
  #15  
Old 03-13-2026, 10:44
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
@cjack

too many "OFFLINE" prints now


[ENDGAME][ 1240s] 200.10 M iter/s | 2.481e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1249s] 200.12 M iter/s | 2.500e+11 iters | DP sent:3
[ENDGAME][ 1250s] 200.04 M iter/s | 2.500e+11 iters | SERVER OFFLINE - retrying...

[DP] 7602 sent, 7602 unique (0.0% dup rate)
[ENDGAME][ 1259s] 200.14 M iter/s | 2.520e+11 iters | DP sent:0
[ENDGAME][ 1260s] 200.06 M iter/s | 2.521e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1269s] 200.16 M iter/s | 2.540e+11 iters | DP sent:4
[ENDGAME][ 1270s] 200.08 M iter/s | 2.541e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1288s] 200.13 M iter/s | 2.578e+11 iters | DP sent:4
[DP] 7801 sent, 7801 unique (0.0% dup rate)
[ENDGAME][ 1321s] 200.02 M iter/s | 2.642e+11 iters | DP sent:3
[DP] 8000 sent, 8000 unique (0.0% dup rate)
[ENDGAME][ 1355s] 199.63 M iter/s | 2.705e+11 iters | DP sent:4
[DP] 8202 sent, 8202 unique (0.0% dup rate)
[ENDGAME][ 1359s] 199.68 M iter/s | 2.714e+11 iters | DP sent:1
[ENDGAME][ 1360s] 199.61 M iter/s | 2.715e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1369s] 199.63 M iter/s | 2.733e+11 iters | DP sent:2
[ENDGAME][ 1370s] 199.56 M iter/s | 2.734e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1379s] 199.65 M iter/s | 2.753e+11 iters | DP sent:4
[ENDGAME][ 1380s] 199.58 M iter/s | 2.754e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1388s] 199.54 M iter/s | 2.770e+11 iters | DP sent:2
[DP] 8401 sent, 8401 unique (0.0% dup rate)
[ENDGAME][ 1389s] 199.61 M iter/s | 2.773e+11 iters | DP sent:3
[ENDGAME][ 1390s] 199.53 M iter/s | 2.773e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1419s] 199.46 M iter/s | 2.830e+11 iters | DP sent:1
[DP] 8601 sent, 8601 unique (0.0% dup rate)
[ENDGAME][ 1459s] 199.42 M iter/s | 2.910e+11 iters | DP sent:2
[DP] 8802 sent, 8802 unique (0.0% dup rate)
[ENDGAME][ 1489s] 199.48 M iter/s | 2.970e+11 iters | DP sent:3
[ENDGAME][ 1490s] 199.41 M iter/s | 2.971e+11 iters | SERVER OFFLINE - retrying...
[ENDGAME][ 1492s] 199.40 M iter/s | 2.975e+11 iters | DP sent:6
[DP] 9006 sent, 9006 unique (0.0% dup rate)
[ENDGAME][ 1499s] 199.50 M iter/s | 2.991e+11 iters | DP sent:3
[ENDGAME][ 1500s] 199.43 M iter/s | 2.992e+11 iters | SERVER OFFLINE - retrying...
__________________
AKA Solomon/blowfish.
Reply With Quote
Reply

Tags
bolero, ecdlp

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing ECDSA in Target (arma) Mynotos General Discussion 3 11-22-2019 00:49


All times are GMT +8. The time now is 05:45.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX, chessgod101
( Since 1998 )