An experiment testing whether LLMs can reproduce Moby Dick verbatim with mixed results.
Articles tagged with benchmarking
A simple yet effective test for weeding out unreliable LLMs using eye and hair color descriptions.
An experiment testing whether LLMs can reproduce Moby Dick verbatim with mixed results.
A simple yet effective test for weeding out unreliable LLMs using eye and hair color descriptions.