Quote:
Originally Posted by WhoCares
@cjack
To be honest, I didn't catch this manually. It was actually flagged by Gemini within Google Antigravity. I simply audited the code referenced in the report to verify it wasn't a false positive. I ran out of tokens for Gemini Pro, so I was using Gemini Flash yesterday—even the free tier provided surprisingly solid analysis. My takeaway is that it’s definitely worth running project code through different AI agents for peer reviews.
I've already gone through the v2.1.0 client implementation and no more issue found(except for some perf optimizations like Warp-level Montgomery Batch Inversion, early-exit or bit-filtering to eliminate candidates before full squaring in ec_canon_x()). If you'd like me to take a look at the server-side logic, you can reach me at "bugtraq at 163 dot com". Alternatively, you might want to perform a local audit using an AI agent for a quick sanity check.
Thanks.
|
@WhoCares That's a
great approach — using multiple AI agents as independent reviewers is something we've fully embraced in this project, after the latest critical bug. The Frobenius invariance bug you flagged was the single most impactful finding: it saved us from burning weeks of GPU time on completely useless DPs. We rebuilt everything from scratch after that (v2.0.0), and then the independent audits caught another subtle bug in the collision resolution logic (missing negation case for d=0). None of these would have been easy to spot manually.
Honestly, we should have started doing independent reviews much earlier in the project — lesson learned. From now on it's standard procedure: every significant change gets reviewed by at least one external AI agent before deployment.
Regarding the perf optimizations you mentioned — we evaluated warp-level batch inversion but decided the risk/complexity wasn't worth it for the marginal gain. The early-exit in ec_canon_x() is interesting but the current 112-squaring loop is already branch-free and register-friendly on sm_120, so we kept it simple. We did implement two other optimizations in v2.1.0 (reusing canon_x from canonicalization + hardware-accelerated scalar multiply with Barrett reduction) which brought us from 377 to 535 M/s per GPU — a clean 1.42x speedup with zero register increase.
Fleet is currently running at 32.5 G/s with 58 workers, ETA ~2 days for the collision. Fingers crossed.
I've sent you the server-side sources via PM — looking forward to your analysis. When this is over, everything goes open source on GitHub.