Researcher, Ph.D.
LLMs seem to struggle when creating AMRs, I find this interesting
(struggle-01 :arg0 (t / thing :mod (l / large) :arg0-of (m / model-01 :arg1 (l / language))) :arg1 (c / create-01 :arg0 t :arg1 (t2 / thing :arg0-of (r / represent-01 (m / meaning)) :mod (a / abstract))))
Evaluation Quirks, Pitfalls and Some Recommendations: A short Survey
A collection of funny evaluation quirks and some general guidance for classification evaluation.
An ACL best-paper seems to have “disproven” Chomsky’s claim that LLMs can model all languages with “equal facility”. I argue that the story is more nuanced.
What’s in a %&!$# vector? Explaining semantic similarity
We check out two interesting methods for interpretability in semantic search.
How to hack an AMR Parsing evaluation – and what to do about it
We score 100 points on a popular NLP parsing benchmark with a simple hack! We also see how we can evaluate such parser more properly and safely.