Juri Opitz

Logo

Researcher, Ph.D.

Email (click)
My GitHub Page

💫Blog

LLMs can’t parse (yet)

Parsing a text into a formal meaning representation like AMR requires high precision and strict observance of guidelines. Interestingly, recent research seems to indicate that LLMs are rather bad at creating AMRs!

What does this mean? For me this shows that LLMs have difficulties in following strict plans, even when fine-tuned. I find this somewhat intuitive, since they excel in pragmatics and also seem reasonably creative. Constraining their “wild nature” into a formal corset might limit their capabilities in other important NLP tasks, like dialogue or summarization, where no strict human guidelines can be defined.

Still, this shows that in tasks where utmost strictness and rule-following are required, LLMs may not always be the best tools available.

But is this inability to parse AMR really a problem, or just evidence that LLMs aren’t the right tool for this particular job? Maybe it’s less a red flag and more a reminder that non-LLM systems and LLMs can complement one another.

References

[1] “You Are An Expert Linguistic Annotator”: Limits of LLMs as Analyzers of Abstract Meaning Representation (Ettinger et al., Findings 2023)

[2] GPT makes a poor AMR parser (Li & Fowlie, JLCL 2025)

[3] Evaluation of Finetuned LLMs in AMR Parsing (Ho Shu Han, arxiv 2025)