Ömer Yüksel


Articles tagged with benchmarking

The whale in the machine: reconstructing Moby-Dick with large language models

An experiment testing whether LLMs can reproduce Moby Dick verbatim with mixed results.

Published:



The color memory test: a quick filter for language model quality

A simple yet effective test for weeding out unreliable LLMs using eye and hair color descriptions.

Published: