Ömer Yüksel - benchmarking tag

Articles tagged with benchmarking

An experiment testing whether LLMs can reproduce Moby Dick verbatim with mixed results.

A simple yet effective test for weeding out unreliable LLMs using eye and hair color descriptions.