Model Error Rate Accuracy Samples
amazon/nova-micro-v1 0.0 100% 10
anthropic/claude-3.5-haiku 0.0 97% 10
anthropic/claude-sonnet-4 0.0 100% 10
deepseek/deepseek-r1-0528 0.0 100% 10
google/gemini-2.5-flash-preview-05-20 0.0 100% 10
google/gemma-3-4b-it 0.0 100% 10
inception/mercury-coder-small-beta 0.0 97% 10
liquid/lfm-3b 0.0 63% 10
meta-llama/llama-3.2-1b-instruct 0.0 67% 10
meta-llama/llama-4-maverick 0.0 100% 10
meta-llama/llama-4-scout 0.0 100% 10
microsoft/phi-4-multimodal-instruct 0.0 100% 10
mistral/ministral-8b 0.0 100% 10
mistralai/ministral-3b 0.0 100% 10
mistralai/mistral-tiny 0.0 17% 10
openai/gpt-4.1-nano 0.0 97% 10
openai/gpt-4o 0.0 100% 10
openai/gpt-4o-mini 0.0 100% 10
qwen/qwen-turbo 0.0 100% 10
qwen/qwen3-8b 0.0 100% 10
sentientagi/dobby-mini-unhinged-plus-llama-3.1-8b 0.0 100% 10
x-ai/grok-3-beta 0.0 100% 10