Exetools  

Go Back   Exetools > General > General Discussion

Notices

 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #19  
Old 03-12-2026, 09:06
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 468
Rept. Given: 11
Rept. Rcvd 32 Times in 25 Posts
Thanks Given: 69
Thanks Rcvd at 247 Times in 94 Posts
WhoCares Reputation: 32
@cjack

Performance optimizations suggested by the AI agent (Claude Opus 4.6 Thinking):

Core conclusion: The two highest-priority optimizations together can provide roughly a 3–4× speedup:

Montgomery batch inversion (P0) — A prototype already exists in pollard_rho.cuh, but it uses an old data structure. It needs to be ported to the fe_t architecture used in solver_fast.cu. This can reduce the per-step cost from 15M + 120S to approximately 6M + 6S.

CUDA Stream double buffering + pinned memory (P0) — The current workflow (kernel → sync → D2H → CPU processing) is strictly serial. Using dual streams with a ping-pong buffer allows GPU computation to fully overlap with CPU/network processing. ������

And I ask AI to code a Python script to run and upgrade "slover_fast.exe" automatically:
https://github.com/z16166/PySolverLauncher/
Attached Files
File Type: rar performance_analysis_en.rar (6.7 KB, 6 views)
__________________
AKA Solomon/blowfish.

Last edited by WhoCares; 03-12-2026 at 15:03.
Reply With Quote
 

Tags
bolero, ecdlp


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing ECDSA in Target (arma) Mynotos General Discussion 3 11-22-2019 00:49


All times are GMT +8. The time now is 22:42.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX, chessgod101
( Since 1998 )