Data Scientist at C.H Robinson
Contributor to CRAN, researcher in machine learning, data scientist.
I'm a machine learning engineer working at C.H. Robinson, a third-party supply chain logistic company. There I work on a broad spectrum of problems across modeling, engineering, and analytical tasks. Before C.H. Robinson I completed my Master's at the University of Alberta in Statistical Machine Learning with my focus being on survival prediction tasks, e.g. predicting a person's lifetime expectancy after being diagnosed with a terminal illness. Recently, I've been most interested in playing with databases (primarily Postgres), optimizing query performance, learning about machine learning pipelines, and generally productionalizing machine learning models. Of course we can't forget I'm also an avid cat dad to my fur daughter Zoe.
Currently I'm a Machine Learning Engineer at C.H. Robinson, a provider of multimodal transportation services and
third-party logistics. Previously, I was a data scientist but made the switch in February, 2021 when I realized I preferred the engineering aspects over the modeling part of Data Science.
At C.H. Robinson, I work with a wide assortment of technologies including Python, Kafka, PostgreSQL, PostGIS, Hive, Snowflake and others.
The focus of my work has been on operational efficiency, i.e. allowing people to perform their jobs better and supplying the tools to help them do just that.
For example, when I was a data scientist I improved our estimated time of arrival (ETA) model which supplies our customers and internal representatives with the information they need to have
to make sure freight gets to where it needs to be when it needs to be there. By testing out a wide array of machine learning models including xgboost, ngboost, neural
networks, and others I was able to increase our predictions by over a factor of 4. This allows us to alert customers days ahead of time that freight is going to be showing up late
and they can adjust any following supply chains.
As a machine learning engineer I deployed this model in real time so we could alert on a late shipment as soon as the information became available.
This entailed using Kafka for stream processing, making the predictions in our consumer framework and then publishing any identified late shipments to our different customer and internal platforms for consumption.
While I love working with machine learning, one of my favorite parts of the job is tuning SQL queries and making appropriate indexes to support the performance we expect. Discovering database indexes and the
"explain analyze" command in postgres is what swung my interest from "big data" to fast data!
As a graduate research assistant under the supervision of Dr. Russell Greiner, my research entailed creating novel methods to both model and evaluate patient specific survival curves. We produced a paper on arXiv (under review at the Journal of Machine Learning Research) evaluating Multi-Task Logistic Regression, Random Survival Forests, Cox Proportional-Hazards, and Accelerated Failure Time survival models across a variety of evaluation metrics for eight different time to event data sets. In addition to this paper, I have also created an R package implementation of Multi-Task Logistic Regression where my source code is also available on GitHub.
I was a statistician intern at the Naval Medical Research Unit in San Antonio, Texas as part of the Naval Research Enterprise Internship Program. Here I was the only statistician in the department so I consulted on a wide variety of projects including the identification of statistically relevant features of third molar (wisdom teeth) dental emergencies of deployed Sailors and Marines and the usage of dental technology in Naval and Marine Corps dental treatment facilities. As part of this internship we were able to publish some of our findings in Military Medicine (peer-reviewed journal).
I worked at Truckers & Turnover, a research group at the University of Minnesota Morris during my undergraduate degree. We studied the effects of Obstructive Sleep Apnea (OSA) on the crash risk, crash cost, and medical costs of commercial truck drivers using proprietary trucking operational data as well as health insurance claims data. Currently we have a paper on the medical cost implications under review at Sleep (peer-reviewed journal). I additionally led a research project examining the effectiveness (or rather ineffectiveness) of the Commercial Driver Medical Exam (CDME) to identify drivers with potential OSA (this work is still unpublished).
I was part of the Research Experience for Undergraduate program at University of Wisconsin, LaCrosse where the program focused on mathematical ecology. Specifically I modelled the survivability of the endangered Indiana Bat species under different conditions utilizing simulations, matrix models, branching processes, and other population growth models. My work on this was published in Natural Resource Modelling and the code used for analysis is publicly available on my GitHub account.
Using survival prediction techniques to learn consumer-specific reservation price distributions
Ping Jin, Humza Haider, Russell Greiner, Sarah Wei, Gerald HäublEffective Ways to Build and Evaluate Individual Survival Distributions.
Haider, H., Hoehn, B., Davis, S. and Greiner, R.The Pre-Registry Commercial Driver Medical Examination: Screening Sensitivity and Certification Lengths for Two Safety-Related Medical Conditions.
Burks SV, Anderson JE, Panda B, Haider HS, Haider R, Shi D, Li Y, Cagle M, Ostroushko D, Sun Z, Zaharick J, Hickman J, Mabry E, Berger M, Czeisler C, Kales SN.Employer-mandated obstructive sleep apnea treatment and healthcare cost savings among truckers.
Burks SV, Anderson JE, Panda B, Haider R, Ginader T, Sandback N, Pokutnaya D, Toso D, Hughes N, Haider HS, Brockman R, Toll A, Solberg N, Eklund J, Cagle M, Hickman JS, Mabry E, Berger M, Czeisler CA, Kales SNSimultaneous Prediction Intervals for Patient-Specific Survival Curves.
Sokota S, D'Orazio R, Javed K, Haider H, Greiner RLongitudinal Analysis of CAD/CAM Restoration Incorporation Rates into Navy Dentistry. Military medicine.
Dickens, N., Haider, H., Lien, W., Simecek, J. and Stahl, J.Incorporating Allee effects into the potential biological removal level. Natural Resource Modeling, 30(3), p.e12133.
Haider, H.S., Oldfield, S.C., Tu, T., Moreno, R.K., Diffendorfer, J.E., Eager, E.A. and Erickson, R.A.As part of my master’s research, I created an R package for Multi-Task Logistic Regression - a tool used for creating Individual Survival Distribution in patient specific survival prediction.
Functionality includes training an MTLR model, predicting survival curves for new observations, and plotting these survival curves and feature weights estimated by MTLR
Refer to Github repository for example usage.
# CRAN:
install.packages("MTLR")
# Or, install direct from GitHub:
# install.packages("devtools")
devtools::install_github("haiderstats/MTLR")
A python package implementing the survival functions found in the paper Effective Ways To Build and Evaluate Individual Survival Distributions. Currently the package only supports the L1-Hinge, L1-Margin, One-Calibration, and D-Calibration evaluation metrics. Note that this package is only for evaluations, all models and predictions must be made prior to utilizing the functions found here.
Refer to Github repository for example usage.
# Install via pip:
pip install survival-evaluation