haiderstats

Data Scientist at C.H Robinson


Contributor to CRAN, researcher in machine learning, data scientist.

Contact me

About me

Humza Haider
Photo of Humza Haider

I'm a machine learning engineer working at C.H. Robinson, a third-party supply chain logistic company. There I work on a broad spectrum of problems across modeling, engineering, and analytical tasks. Before C.H. Robinson I completed my Master's at the University of Alberta in Statistical Machine Learning with my focus being on survival prediction tasks, e.g. predicting a person's lifetime expectancy after being diagnosed with a terminal illness. Recently, I've been most interested in playing with databases (primarily Postgres), optimizing query performance, learning about machine learning pipelines, and generally productionalizing machine learning models. Of course we can't forget I'm also an avid cat dad to my fur daughter Zoe.

Work Experience

Machine Learning Engineer(August 2019 – Present)

Currently I'm a Machine Learning Engineer at C.H. Robinson, a provider of multimodal transportation services and third-party logistics. Previously, I was a data scientist but made the switch in February, 2021 when I realized I preferred the engineering aspects over the modeling part of Data Science. At C.H. Robinson, I work with a wide assortment of technologies including Python, Kafka, PostgreSQL, PostGIS, Hive, Snowflake and others. The focus of my work has been on operational efficiency, i.e. allowing people to perform their jobs better and supplying the tools to help them do just that.

For example, when I was a data scientist I improved our estimated time of arrival (ETA) model which supplies our customers and internal representatives with the information they need to have to make sure freight gets to where it needs to be when it needs to be there. By testing out a wide array of machine learning models including xgboost, ngboost, neural networks, and others I was able to increase our predictions by over a factor of 4. This allows us to alert customers days ahead of time that freight is going to be showing up late and they can adjust any following supply chains.

As a machine learning engineer I deployed this model in real time so we could alert on a late shipment as soon as the information became available. This entailed using Kafka for stream processing, making the predictions in our consumer framework and then publishing any identified late shipments to our different customer and internal platforms for consumption.

While I love working with machine learning, one of my favorite parts of the job is tuning SQL queries and making appropriate indexes to support the performance we expect. Discovering database indexes and the "explain analyze" command in postgres is what swung my interest from "big data" to fast data!

Graduate Research Assistant (April 2018 - August 2019)

Thesis: Individual Survival Distributions: A More Effective Tool for Survival Prediction – PDF

As a graduate research assistant under the supervision of Dr. Russell Greiner, my research entailed creating novel methods to both model and evaluate patient specific survival curves. We produced a paper on arXiv (under review at the Journal of Machine Learning Research) evaluating Multi-Task Logistic Regression, Random Survival Forests, Cox Proportional-Hazards, and Accelerated Failure Time survival models across a variety of evaluation metrics for eight different time to event data sets. In addition to this paper, I have also created an R package implementation of Multi-Task Logistic Regression where my source code is also available on GitHub.

Statistician Internship (June 2017 - August 2017)

I was a statistician intern at the Naval Medical Research Unit in San Antonio, Texas as part of the Naval Research Enterprise Internship Program. Here I was the only statistician in the department so I consulted on a wide variety of projects including the identification of statistically relevant features of third molar (wisdom teeth) dental emergencies of deployed Sailors and Marines and the usage of dental technology in Naval and Marine Corps dental treatment facilities. As part of this internship we were able to publish some of our findings in Military Medicine (peer-reviewed journal).

Undergraduate Researcher (June 2015 - June 2017)

I worked at Truckers & Turnover, a research group at the University of Minnesota Morris during my undergraduate degree. We studied the effects of Obstructive Sleep Apnea (OSA) on the crash risk, crash cost, and medical costs of commercial truck drivers using proprietary trucking operational data as well as health insurance claims data. Currently we have a paper on the medical cost implications under review at Sleep (peer-reviewed journal). I additionally led a research project examining the effectiveness (or rather ineffectiveness) of the Commercial Driver Medical Exam (CDME) to identify drivers with potential OSA (this work is still unpublished).

REU Student Researcher (May 2016 - August 2016)

I was part of the Research Experience for Undergraduate program at University of Wisconsin, LaCrosse where the program focused on mathematical ecology. Specifically I modelled the survivability of the endangered Indiana Bat species under different conditions utilizing simulations, matrix models, branching processes, and other population growth models. My work on this was published in Natural Resource Modelling and the code used for analysis is publicly available on my GitHub account.


Education

Master of Science in Computer Science
Specialization in Statistical Machine Learning

Relevant Coursework and Course Project

Bachelors of Arts

Majors in:
  • Statistics
  • Mathematics
  • Computer Science

Publications

Select a publication to go to its abstract and journal site.

Using survival prediction techniques to learn consumer-specific reservation price distributions

Ping Jin, Humza Haider, Russell Greiner, Sarah Wei, Gerald Häubl
2021 https://doi.org/10.1371/journal.pone.0249182

Effective Ways to Build and Evaluate Individual Survival Distributions.

Haider, H., Hoehn, B., Davis, S. and Greiner, R.
2020 http://www.jmlr.org/papers/v21/18-772.html

The Pre-Registry Commercial Driver Medical Examination: Screening Sensitivity and Certification Lengths for Two Safety-Related Medical Conditions.

Burks SV, Anderson JE, Panda B, Haider HS, Haider R, Shi D, Li Y, Cagle M, Ostroushko D, Sun Z, Zaharick J, Hickman J, Mabry E, Berger M, Czeisler C, Kales SN.
2020 https://doi.org/10.1097/JOM.0000000000001816

Employer-mandated obstructive sleep apnea treatment and healthcare cost savings among truckers.

Burks SV, Anderson JE, Panda B, Haider R, Ginader T, Sandback N, Pokutnaya D, Toso D, Hughes N, Haider HS, Brockman R, Toll A, Solberg N, Eklund J, Cagle M, Hickman JS, Mabry E, Berger M, Czeisler CA, Kales SN
2019 https://doi.org/10.1093/sleep/zsz262

Simultaneous Prediction Intervals for Patient-Specific Survival Curves.

Sokota S, D'Orazio R, Javed K, Haider H, Greiner R
2019 https://arxiv.org/abs/1906.10780

Longitudinal Analysis of CAD/CAM Restoration Incorporation Rates into Navy Dentistry. Military medicine.

Dickens, N., Haider, H., Lien, W., Simecek, J. and Stahl, J.
2018 https://doi.org/10.1093/milmed/usy260

Incorporating Allee effects into the potential biological removal level. Natural Resource Modeling, 30(3), p.e12133.

Haider, H.S., Oldfield, S.C., Tu, T., Moreno, R.K., Diffendorfer, J.E., Eager, E.A. and Erickson, R.A.
2017 https://doi.org/10.1111/nrm.12133

Notable Work

R Package: MTLR

An implementation of Multi-Task Logistic Regression (MTLR) for R

As part of my master’s research, I created an R package for Multi-Task Logistic Regression - a tool used for creating Individual Survival Distribution in patient specific survival prediction.

Functionality includes training an MTLR model, predicting survival curves for new observations, and plotting these survival curves and feature weights estimated by MTLR

Refer to Github repository for example usage.

Installing MTLR
# CRAN:
install.packages("MTLR")
# Or, install direct from GitHub:
# install.packages("devtools")

devtools::install_github("haiderstats/MTLR")

GitHub Repo CRAN Page

PyPi Package: survival-evaluation

A couple survival evaluation metrics.

A python package implementing the survival functions found in the paper Effective Ways To Build and Evaluate Individual Survival Distributions. Currently the package only supports the L1-Hinge, L1-Margin, One-Calibration, and D-Calibration evaluation metrics. Note that this package is only for evaluations, all models and predictions must be made prior to utilizing the functions found here.

Refer to Github repository for example usage.

Installing survival-evaluation
# Install via pip:

pip install survival-evaluation

GitHub Repo PyPi Page



The best way to contact me is by email — I will aim to respond within 24 hours.

© haiderstats | Humza Haider