![]() |
|
#36
|
||||
|
||||
|
Perhaps we'd better upgrade CUDA toolkit from 12.x to 13.1.
For learning purposes, I asked an AI to optimize the GPU kernel function pollard_kernel(), mainly targeting the NVIDIA GeForce RTX 5090. The optimization goal was to reduce register usage from 96 registers to 64 registers. This increases SM occupancy, allowing the number of blocks that can run concurrently on a single SM to increase from 5 to 8, yielding a theoretical performance improvement of around one third. The actual performance gain should be evaluated using NVIDIA Nsight Compute together with real benchmark data. By leveraging the SMRS compiler feature introduced in NVIDIA CUDA Toolkit 13.0, spilled registers can be replaced with accesses to shared memory, making it possible to ultimately achieve the 64-register optimization target. Quote:
__________________
AKA Solomon/blowfish. Last edited by WhoCares; 03-18-2026 at 16:12. |
| The Following User Gave Reputation+1 to WhoCares For This Useful Post: | ||
cjack (03-18-2026) | ||
| The Following 4 Users Say Thank You to WhoCares For This Useful Post: | ||
cjack (03-18-2026), niculaita (03-19-2026), nulli (03-20-2026), wx69wx2023 (03-18-2026) | ||
| Tags |
| bolero, ecdlp |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Replacing ECDSA in Target (arma) | Mynotos | General Discussion | 3 | 11-22-2019 00:49 |