Publications

(2023). NPHardEval: Benchmarking Reasoning Ability of Large Language Models via Complexity Classes.

PDF Code Project