JF

J.D. Fagan

Staff Machine Learning Engineer at Doma

San Francisco, California

Overview

Work Experience

  • Staff Machine Learning Engineer

    2021 - Current

    Helped Data Science team overcome segmentation faults in production by revamping CI/CD process by developing a Docker image that is the same across environments (dev > test > prod) and re-writing the driving go bash script for their mono repo. Led a PoC project to explore different ways for Data Science team to manage ML/DL models pipeline/lifecycle as it goes from development/experiments/training to production/inference/predicting. Actively consulted for regarding Airflow data pipelines to support ML training pipelines. Re-designed build system including splitting out docker images for various Data Science teams and automating this build via cutting edge CircleCI tactics. Helped implement FastAPI RESTful API apps written in Python that wraps our data science models created from our docker images for easy deployment to Kubernetes clusters running in Azure. Update docker images, lots of Python code, introduced Pydantic for simpler API RESTful validation checks for async request/response payloads making use of Azure Service Bus. Updated CI/CD build/test/deployment automation by updating our CircleCI declarative code. Simplified our docker builds and devex interactions via intuitive go bash scripting tactics. Validated cloud migration project was correct via smoke, spike, resource, and stress tests via locust + deepdiff Python tooling that queried our Snowflake backend views of existing past production request/response payloads. Helped solve challenging multi-threaded race conditions bug that surfaced from thread unsafe code that occurred after moving from single threaded Flask app to a more modern multi-threaded FastAPI app.

  • Staff Data Engineer

    2020 - 2023

    Wearing many hats from Data Engineer to DBA to Devops to Data Architect in a very short period at this fun and productive startup disrupting the real estate title insurance industry. Supported Business Intelligence group’s need for a scalable Snowflake JavaScript function for determining working hours datediff between two times that only counts working hours. Scripted automated DDL generation from SQL Server DDL to Snowflake DDL as basis for landing source data via ETL mechanisms to help support our data warehouse build out. Built many dag based data pipelines in Airflow such as hourly pulls of Azure Cosmos data into Azure Blob storage for copy into Snowflake data warehouse.

  • Staff Data Engineer

    2019 - 2020

    Worked in Data Science and Engineering group for Dropbox, an exabyte data intensive client. Actively building scalable data pipelines - writing lots of distributed Hive and Snowflake SQL and writing Python Airflow DAGs and Plugins for managing and validating the data flow into a dimensional data warehouse. Supported Business Intelligence group’s aggressive deadlines and delivered 3 projects in as many months. Helped with devops work to aligning Python and pip versioning in prod and simplify development requirements for Python packaging for our Git repo. Strong advocate for improving data warehouse design to be star schema oriented and the ETL changes needed to support such a design.

  • Staff Data Engineer

    2018 - 2019

    Biotech data analytics automation. Helped introduce and influence use of Gitflow to improve software engineering processes within the engineering organization to better manage how we develop and release our software in two week sprint cycles. Optimized slow running MySQL queries by evaluating query plans and determine minimum set of indexes we should use for efficient and fast queries. Was on the production support rotation team that helped maintain production systems for over a dozen client databases and their data pipelines (both Airflow and cron based pipelines). Helped team identify its critical technical debt, like the need to refactor their legacy code base from Python2 to Python3. Also, pushed team to move away from tightly coupled, brittle designs to more flexible, loosely coupled designs. Influenced team to use test driven development tactics more and to introduce CI/CD tools to automate unit and integration testing. Worked with another engineer on migrating our data pipeline as cron jobs to Python based Airflow. SQL, Bash scripting, Airflow, and Python used heavily on daily basis.

  • Lead Data Engineer

    2017 - 2018

    Real estate data analytics automation. Very hands on lead data engineer to help re-architect and re-code an aging data backend design that has many manual processes being done behind the scenes. Designed and coded an object-oriented Python ETL framework that make use of various RESTful APIs for improving our data extraction tactics of pulling clients' accounting data. Evaluated Snowflake as our cloud data warehouse over Redshift to empower its leveraged advantages in order to keep our data team lean and focused on the data tasks that create value add for Waypoint in the marketplace. Designed data model for an enhanced star schema approach for the data warehouse logical and physical design as advocated by Lawrence Corr, a former colleague of Ralph Kimball. Using Airflow as the ETL workflow management tool to streamline automation of the data pipeline management jobs and processes. Helped to recruit other data engineers into Waypoint as they grew market share in the commercial real estate industry. Worked with 3rd party data providers on providing bug/improvement idea feedback to their data APIs that we were ingesting.

  • Data/Software Architect

    2016 - 2016

    Data designing of data pipelines and leading a talented team developing machine/deep learning code in the cloud for a sports based social network uniting athletes, teams and their fans. AWS Backend Fullstack Cloud technology includes Redshift, Aurora, DynamoDB, S3, Lambda, SNS, APIGateway. Devops automation of Lambda + APIGateway deployments via Serverless, Salt, AMIs, CloudFormation, Boto, AWS-CLI. Worked on ELT data migration from SQL Server to Aurora.

  • Systems/Software Architect

    2014 - 2016

    Systems and Software Architect ensuring new projects introduced to Wells Fargo production data centers could be handled from performance capacity and security design aspect. Automate ETL processing in Python making use of various libraries: requests, csvkit, SqlAlchemy. Work with business to understand new projects and their impacts to any back end infrastructure that currently supports their code. When impacts are determined, design a solution to handle the anticipated load that project will have and appropriately size the solution to handle existing and forecasted capacity and performance for 2 - 4 years out. Design UML Deployment diagrams via EA CASE tool.

  • Data/Software Architect

    2011 - 2014

    C#, Big Data, Data Mining, Business Intelligence, Machine Learning. Provide data analytics expertise. Wrote sqoop ETL query to pull OLTP SQL Server data over to a large Hadoop grid. Wrote Hive aggregation queries across multiple dimensions and then pull all this data into Tableau for quick slice/dicing of data via Pivot Tables/Charts for visualization of business data for business leaders. Worked with team of seven engineers supporting a backend telephony web service application written in C#/.NET.

  • Software Architect/Engineer

    2005 - 2011

    Consulted for various clients from backtesting and designing/writing fully automated, algorithmic trading software systems to designing backend relational database systems to support new software product development. Also helped do low level layer network diagnostics engineering with access to my own Fluke Networks toolset = from layer 1 physical (wired and Wi-Fi) diagnostics testing to system designing, configuring and optimizing layer 2 and 3 networking hardware (switches and routers). Performed some pen-testing for Cisco who was acquiring another company (Meraki).

  • Performance Architect/Engineer

    2002 - 2006

    Led performance and infrastructure engineers where we verified code against performance stress testing prior to deployment to our production farm servers. Led many platform infrastructure projects to upgrade JVMs, Weblogic Application servers, web servers, and/or new tools that we’re being introduced into the production environment. Worked with project managers on assembling detailed project plans in Gantt chart form to help them understand the interdependencies between various technical tasks. Wrote install scripts for rollout of these projects after it had been tested in our simulated production environments. Handled all outstanding code and content issues to make a particularly painful Java VM upgrade turnkey for both developers and content authors of wellsfargo.com. Rewrote build system in Ant to replace an aging Unix shell build system.

  • Software Architect

    2001 - 2002

    Designed a backend shipping system that supports all of their customer front-end and internal front-end clients, including Web Shipping and PC based client software. The shipping system replaced an existing C++ based system and utilized the latest J2EE technology at the time -- specifically EJB 2.0 based beans. Application was implemented on top of Weblogic 6.1. Designed Session, Message and Entity Beans and their interactions, mapping back of CMP based Entity Beans to RDBMS via XML descriptor files. Upgraded their build system to utilize Ant and EJBGen (a bean generating tool).

Relevant Websites