Matthew J DeVay

I am a former chemist looking to leverage his vast knowledge of statistics and experimental design into a career with more growth opportunities while avoiding risks to my long-term health.
Connect with me on LinkedIn.

Recent Work

Visualizing FBI Crime Reporting Data with Dash/Plotly: FBI crime statistics are published in unfriendly formats and have inconsistent formatting. This project cleaned and aggregated data from summary statistics. This data was then amended with policy information. Dash/Plotly was used to build a dashboard to visualize the effects of said policy changes. This project is ongoing, and the current iteration is utilizing both NCR and Victim Survey raw data. Currently, I am working on an engine to parse the proprietary FBI flatfiles used to store the raw data.

Improving Slum Detection via Machine Vision and Low-res Satellite imagery: Low-res satellite imagery is inexpensive or free to obtain; however, this limits its practical application. This project aimed to generate a feature that would improve MV slum detection projects. K-means clustering was performed on Facebook’s high-res population density maps. Webscraping via Selenium of real estate listings (both rentals and sales) was performed, and entries were geotagged via Google Cloud API. The geo tagged listings were then used to rank the rate of advertising for each cluster, and the ranked clusters were then used as inputs to the MV model using GPS co-ordinates as labels. This project is ongoing, and I am currently using the scraping bot to collect more data, as well as building a new MV model to better take advantage of the generated feature.

Sub-reddit analysis – Is machine learning significantly different than data science?: In order to settle a debate, a hypothesis was generated and tested: Is machine learning a subset or included in the field of data science. Sub-reddits dealing specifically with both machine learning and data science were collected via the reddit API and vectorized with several techniques. Several classification models were created, the best of which could only correctly assign a post 80% of the time. This shows that while some distinction in language use exists between data science and machine learning, accurate predictions fell short of the criteria for full separation as defined by the bet (95%).


General Assembly Data Science Immersive Fellowship (2020): A twelve week immersive program focused on the most common and useful statistical and machine learning packages for Python and Scala.

GX Sciences – Method Development (2018-2019): TL;DR: I used data and statistics to help build a new business.

I created a biomedical lab from the ground up. Literally. Supply chains, infrastructure, instrumentation, hiring, installation, the whole nine yards. Once that was moving, I then designed and executed experiments to create new, accessible, state-of-the-art, and inexpensive point-of-care medical testing. I successfully created two marketable medical tests: Vitamin D in whole blood and Creatinine in urine. I programmed robots to automate lab processes, created databases to manage supply chains, and helped develop tools to automate the QA program. I finally had to leave after I realized that I had worked for 22 days straight for at least 16 hours a day.

Trace Analytics – QA/QC (2016-2018): TL;DR: I managed and analyzed a vast amount of data in order to keep our customers happy and our business profitable.

I was responsible for the QA/QC program. This includes compliance management, internal and external auditing, performance monitoring, trend analysis, change control, continual improvement, CAPA/RCA, and both employee and customer training programs. I worked directly with the IT director to develop and track KPIs to monitor all facets of the QA program. These improvements led to a 90% decrease in audit findings, as well as increased revenue by streamlining processes, increasing efficiency, nearly doubling sample throughput, and improving customer satisfaction.