Having finished my MSc in Computational Linguistics at the University of Zurich in 2016, I worked as a researcher, programmer and lecturer in Switzerland and Japan 👨🎓 Meanwhile, I have tended bar constantly, creating award-winning alcoholic and non-alcoholic drinks and developing a fascination for all things olfaction 👃
Scientific Programmer
💊 💽 In the SwissMADE project, I am responsible for the automatic processing of electronic patient reports to discover automatically adverse drug events, while in the BERGAMOS project, I am contributing to making annotations on the biomedical literature more interchangeable.
Lecturer In Computer Science
🕵️♀️ 🛁 Teaching to young adults aged 14 to 20 a variety of courses, such as advanced networking, introduction to programming in C#, web application development and, particularly engaging, an Arduino tinkering class. Furthermore, I am responsible for the development of new teaching materials for a machine learning course.
Visiting Lecturer
🐍 👅 Developing and teaching an introductory python course aimed at linguists, with a focus on NLP.
Scientific Programmer
💊 📈 As part of a project on evaluating patient data involving several Swiss hospitals (CHUV, KSB, HUG, USZ) and universities (UZH, UniL), I work to discover adverse drug reactions using NLP techniques.
Teaching Assistant
🎩 👨🏫 Running exercises, tutoring sessions and the occasional lecture for XML, Software Engineering, Introduction to Programming and, most recently, Advanced Text Mining Techniques
Research Assistant
🩺 🏔 Several projects related to Biomedical Text Mining at the OntoGene group (ontogene.org) at the Department of Computational Linguistics working with natural language processing technologies, python and machine learning and co-authoring several papers.
Visiting Lecturer
🐠 👨🎓 Teaching in a course about XML for a continuing education programme for the employees of the Zentralbibliothek
Visiting Researcher
🇯🇵 🌳 Building a RESTful API for several dependency parsers and pubannotation.org, an online repository of biomedical annotations, in ruby.
Nightliner
☎️ 🌙 During my Bachelor studies I volunteered at Nightline Zürich, which is a service that students can call if they are overwhelmed by their situation and need a listener and advice.
Martial Arts Instructor
💪 🥋 I have been practising my karate for over 20 years, and have attained my dan (black belt). Particularly rewarding was my job as a trainer of a children's class. Since then I started to practise Wing Chun, and teach students at the university.
ICPR 2021: Pattern Recognition. ICPR International Workshops and Challenges
The Swiss Monitoring of Adverse Drug Events (SwissMADE) project is part of the SNSF-funded Smarter Health Care initiative, which aims at improving health services for the public. Its goal is to use text mining on electronic patient reports to automatically detect adverse drug events automatically in hospitalised elderly patients who received anti-thrombotic drugs. The project is the first of its kind in Switzerland: the data is provided by four hospitals from both the German- and French-speaking part of Switzerland, all of which had not previously released electronic patient records for research, making extraction and anonymisation of records one of the major challenges of the project.
In this paper, we describe the part of the project concerned with the de-identification and annotation of German data obtained from one of the hospitals in the form of patient reports.
All of these reports are automatically de-identified using a dictionary-based approach augmented with manually created rules, and then automatically annotated. For this, we employ our entity recognition pipeline called OGER (OntoGene Entity Recognizer), also a dictionary-based approach, augmented by an adapted transformer model to obtain state of the art performance, to detect drug, disease and symptom mentions in these reports. Furthermore, a subset of reports are manually annotated for drugs and diagnoses by a medical expert, serving as a validation set for the automatic annotations.
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.
Proceedings of the 4th Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.
Genomics Inform.
Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.
Journal of Cheminformatics
We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.
SMM4H: The 3rd Social Media Mining for Health Applications Workshop and Shared Task
Our team at the University of Zurich participated in the first 3 of the 4 sub-tasks at the Social Media Mining for Health Applications (SMM4H) shared task. We experimented with different approaches for text classification, namely traditional feature-based classifiers (Logistic Regression and Support Vector Machines), shallow neural networks, RCNNs, and CNNs. This system description paper provides details regarding the different system architectures and the achieved results.
7th International Symposium on Semantic Mining in Biomedicine
This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step. The technique is then evaluated on the CRAFT corpus. We present the performance we obtained, analyze the errors and propose a possible follow-up of this work.