PhD student at Heidelberg University
I’m Juri Opitz, a final year PhD student at Heidelberg University. I’m interested in machine learning, with a focus on NLP (Natural Language Processing).
How can we capture who does what to whom in a text? A meaning representation tries to express the answer to this questions in a structured and explicit format, such as a graph.
One of my interests is the design and application of metrics between such representations. An interesting potential of such metrics is that they can explain to us why (and in which aspects) two texts are similar, or dissimilar.
There’s also an issue that I believe has prevented a more wide-spread use of meaning representations: their generation is often slow and complex, and thus they become kind of less useful. Therefore, another motivation of mine is to make meaning representations more useful. For instance, in this paper we refine sentence BERT embeddings with meaning representations to make them more explainable – without requiring a system for their generation. That means we can keep all efficiency and power of the neural sentence embeddings while getting some of that cool explainability of meaning representations! Check out this repository for the code.
Even though classification is a task that seems straightforward, our choice of evaluation metric for testing the performance of a classifier can differ from case to case. Here’s some classic questions:
I’ve written two notes related to these topics. One is an analysis of false friends: Macro F1 and macro F1 (no typo!), here’s the paper. Then there are some refined teaching notes (pdf) that I wrote during my teaching, triggered by re-occuring discussions with students in reading groups.
Finally, a passion of mine is digging into historic data sets. In particular, I’d like to test large-scale statistics in collections of historic texts, exploring the European medieval ages across time and spatial axes.
For instance, we have automatically reconstructed coordinates and movement patterns for thousands of medieval entities, starting from the time of the Carolingian dynasty (~750 CE) to Maximilian I. (~ 1500 CE). Of course, “automatic” also means that there’s much space for future work to reduce the error of the resconstructions.
All code for the experiments and the data is available at this repository.
See Google Scholar