Nico Colic

Nico Colic

🧑‍🏫🍸 Teaching and NLP research by day, gastronomy by night.

profile-pic

Summary

Having finished my MSc in Computational Linguistics at the University of Zurich in 2016, I worked as a researcher, programmer and lecturer in Switzerland and Japan 👨‍🎓 Meanwhile, I have tended bar constantly, creating award-winning alcoholic and non-alcoholic drinks and developing a fascination for all things olfaction 👃

Experience

IDSIA

04/2021 - 12/2021

Scientific Programmer

💊 💽 In the SwissMADE project, I am responsible for the automatic processing of electronic patient reports to discover automatically adverse drug events, while in the BERGAMOS project, I am contributing to making annotations on the biomedical literature more interchangeable.

Berufsfachschule BBB

08/2020 - Present

Lecturer In Computer Science

🕵️‍♀️ 🛁 Teaching to young adults aged 14 to 20 a variety of courses, such as advanced networking, introduction to programming in C#, web application development and, particularly engaging, an Arduino tinkering class. Furthermore, I am responsible for the development of new teaching materials for a machine learning course.

Visiting Lecturer

🐍 👅 Developing and teaching an introductory python course aimed at linguists, with a focus on NLP.

Universitätsspital Zürich

12/2018 - 12/2021

Scientific Programmer

💊 📈 As part of a project on evaluating patient data involving several Swiss hospitals (CHUV, KSB, HUG, USZ) and universities (UZH, UniL), I work to discover adverse drug reactions using NLP techniques.

University of Zurich

12/2014 - 08/2020

Teaching Assistant

🎩 👨‍🏫 Running exercises, tutoring sessions and the occasional lecture for XML, Software Engineering, Introduction to Programming and, most recently, Advanced Text Mining Techniques

University of Zurich

12/2016 - 12/2020

Research Assistant

🩺 🏔 Several projects related to Biomedical Text Mining at the OntoGene group (ontogene.org) at the Department of Computational Linguistics working with natural language processing technologies, python and machine learning and co-authoring several papers.

Zentralbibliothek Zürich

12/2019 - 12/2019

Visiting Lecturer

🐠 👨‍🎓 Teaching in a course about XML for a continuing education programme for the employees of the Zentralbibliothek

Visiting Researcher

🇯🇵 🌳 Building a RESTful API for several dependency parsers and pubannotation.org, an online repository of biomedical annotations, in ruby.

Volunteer

Nightliner

☎️ 🌙 During my Bachelor studies I volunteered at Nightline Zürich, which is a service that students can call if they are overwhelmed by their situation and need a listener and advice.

Martial Arts Instructor

💪 🥋 I have been practising my karate for over 20 years, and have attained my dan (black belt). Particularly rewarding was my job as a trainer of a children's class. Since then I started to practise Wing Chun, and teach students at the university.

Education

University of Zurich

12/2014 - 12/2016

Master’s Degree Computational Linguistics

summa cum laude

University of Zurich

12/2009 - 12/2013

Bachelor’s Degree Neuroinformatics

Awards

JSPS Fostering Joint International Research

Japan Society for the Promotion of Science

summa cum laude

University of Zurich

Publications

ICPR 2021: Pattern Recognition. ICPR International Workshops and Challenges

The Swiss Monitoring of Adverse Drug Events (SwissMADE) project is part of the SNSF-funded Smarter Health Care initiative, which aims at improving health services for the public. Its goal is to use text mining on electronic patient reports to automatically detect adverse drug events automatically in hospitalised elderly patients who received anti-thrombotic drugs. The project is the first of its kind in Switzerland: the data is provided by four hospitals from both the German- and French-speaking part of Switzerland, all of which had not previously released electronic patient records for research, making extraction and anonymisation of records one of the major challenges of the project. In this paper, we describe the part of the project concerned with the de-identification and annotation of German data obtained from one of the hospitals in the form of patient reports. All of these reports are automatically de-identified using a dictionary-based approach augmented with manually created rules, and then automatically annotated. For this, we employ our entity recognition pipeline called OGER (OntoGene Entity Recognizer), also a dictionary-based approach, augmented by an adapted transformer model to obtain state of the art performance, to detect drug, disease and symptom mentions in these reports. Furthermore, a subset of reports are manually annotated for drugs and diagnoses by a medical expert, serving as a validation set for the automatic annotations.

Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.

Proceedings of the 4th Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.

Genomics Inform.

Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

Journal of Cheminformatics

We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.

SMM4H: The 3rd Social Media Mining for Health Applications Workshop and Shared Task

Our team at the University of Zurich participated in the first 3 of the 4 sub-tasks at the Social Media Mining for Health Applications (SMM4H) shared task. We experimented with different approaches for text classification, namely traditional feature-based classifiers (Logistic Regression and Support Vector Machines), shallow neural networks, RCNNs, and CNNs. This system description paper provides details regarding the different system architectures and the achieved results.

7th International Symposium on Semantic Mining in Biomedicine

This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step. The technique is then evaluated on the CRAFT corpus. We present the performance we obtained, analyze the errors and propose a possible follow-up of this work.