We evaluate DeepCode on the PaperBench benchmark (released by OpenAI), a rigorous testbed requiring AI agents to independently reproduce 20 ICML 2024 papers from scratch. The benchmark comprises 8,316 ...
5.7 update: Updated the code to 5.7. I will have to update the readme over time 5.6 update: Updated the code to 5.6 with many small improvements and fixes. Apologies for the lack of response to PRs ...