![]() |
|
#19
|
||||
|
||||
|
@cjack
Performance optimizations suggested by the AI agent (Claude Opus 4.6 Thinking): Core conclusion: The two highest-priority optimizations together can provide roughly a 3–4× speedup: Montgomery batch inversion (P0) — A prototype already exists in pollard_rho.cuh, but it uses an old data structure. It needs to be ported to the fe_t architecture used in solver_fast.cu. This can reduce the per-step cost from 15M + 120S to approximately 6M + 6S. CUDA Stream double buffering + pinned memory (P0) — The current workflow (kernel → sync → D2H → CPU processing) is strictly serial. Using dual streams with a ping-pong buffer allows GPU computation to fully overlap with CPU/network processing. ������ And I ask AI to code a Python script to run and upgrade "slover_fast.exe" automatically: https://github.com/z16166/PySolverLauncher/
__________________
AKA Solomon/blowfish. Last edited by WhoCares; 03-12-2026 at 15:03. |
| Tags |
| bolero, ecdlp |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Replacing ECDSA in Target (arma) | Mynotos | General Discussion | 3 | 11-22-2019 00:49 |