Resume of Marius van Niekerk
avatar

Marius van Niekerk
Full stack data architect

About

As both a software developer and a statistician I have a keen insight into how data scientists want to write code, and how to deploy that to production environments. On the spectrum of data science, I lean much more heavily towards the software engineering side. I believe in getting working solutions off the ground as quickly as possible with rapid iteration cycles. I enjoy picking up new tools, languages, and codebases as projects require them. I'm an active proponent and contributor to various open source initiatives.

Work Experience

Flatiron Health
An Oncology Health-Tech company
March 2017 – Present
Staff Software Engineer
Build, design and maintain tools and architecture to aid in the processing of electronic health record based datasets.
Highlights
  • Architect for the core platform organization (40+ engineers)
  • Migrated a large legacy ETL framework written in bash to a modern python equivalent using dask whilst maintaining full backwards compatibility
  • Deployed Snowflake at enterprise scale
  • Developed a simple S3-based data catalog system for archival datasets with atomicity guarantees and versioning
  • Migrated petabytes of datasets from csv to parquet
Maxpoint Interactive
Ad-tech focusing on geospatial targeting
April 2016 – February 2017
Senior Computational Engineer
Worked as a tool builder and facilitator between the data science team and the data engineering team. Particular focus was on scaling solutions to work on massive datasets (10 billion+ rows). Lots of big data work using hive, impala, spark, and MapReduce.
Highlights
  • A cross-language framework for building data pipelines using Apache Spark. This allows data scientists to to write code in python and easily move it to a production process written in Scala.
  • Large scale geo-spatial analysis model that performs matching between billions of GPS points and millions of addresses with Apache Spark to effectively compute approximations of home locations from observations of noisy GPS data.
Maxpoint Interactive
Ad-tech focusing on geospatial targeting
April 2012 – April 2016
Senior Data Scientist
Primary support for a team of 23 data scientists, 17 engineers and 21 analysts for moving scientific applications to production. Team lead for a team of computational engineers. Built various smaller ad-hoc solutions for business problems on very short time scales. Drove key technology decisions working with the CTO directly around which big data technologies to focus our efforts on.
Highlights
  • A dynamic ad serving product based on real-time weather conditions.
  • An in-house monitoring and alerting web application used by all account managers and analysts to manage active advertising campaigns (~$150M in spend).
  • A Prediction algorithm for detecting non-human web traffic using boosted trees.
CIMSO
June 2007 – March 2012
Software Engineer
Design, enhancement, optimization and maintenance of a large Delphi-based hospitality focused desktop software suite deployed throughout Africa and South East Asia.
Highlights
  • Implemented an online hotel reservation portal in web2py with live synchronization to the rest of the software suite.
  • Embedded LibreOffice inside the hotel management application with the purpose of allowing for unified storage and management of documents.
  • Developed a generic database abstraction layer. The database abstraction layer provides statically typed wrappers around any given database table with flexible code generation in order to support multiple database architectures.
  • Developed a generic reporting engine that supports arbitrary grouping of data with the purpose of providing customizable reports for hotel and financial management.

Volunteer

conda-forge
March 2018 – Present
core contributor
Core maintainer for conda-forge
Highlights
  • Developed several of the parts of conda-smithy, a tool to dynamically generate build recipe repositories for conda packaging
  • Conda-lock a lightweight locking mechanism to aid in reproducibility of conda environments.

Contact

Education

  • 2008 2012

    Stellenbosch University

    Master of Science

    Mathematical Statistics

  • 2007 2007

    University of Stellenbosch

    Bachelor of Science (Honours)

    Mathematical Statistics

  • 2004 2006

    University of Stellenbosch

    Bachelor of Commerce

    Actuarial Science

Skills

Programming languages
Python Scala Go Delphi R
Notable software libraries with extensive experience
Conda Dask Apache Spark Jupyter stack Apache Airflow
Devops Tools
Terraform Ansible Chef
Cloud Experience
AWS GCP

Interests

Wine
Cooking