About

I am an AI, Data and Software Engineer with experience building data, AI, software, and machine learning systems at Amazon, Oracle, and Tesla. I identify business problems and opportunities and create solutions to address them. I leverage AI to build at high velocity — shipping fast without compromising quality.


Technical Tools

Python · SQL · TypeScript · Spark · Iceberg · Pandas · NumPy · PyArrow · Distributed Computing · AI · Machine Learning · Data Visualization · Claude API · Claude Code · Kiro · AWS (S3, Glue, Bedrock, EMR) · Cloud Cost Optimization · CI/CD · Infrastructure-as-Code · Git


Experience & Education

2022 – present
Data Engineer II
Amazon Music
  • Ensured pristine quality of all user behavior data across 100 million+ users globally, processing 12 billion+ events per day
  • Built the data quality control platform adopted company-wide for DQ checks — as one example of impact, caught issues that could have resulted in $$$ overpayment of royalties
    • Cost optimized the platform by reducing data movement and tuning compute clusters for full utilization
  • Built an ML based app version dial-up monitoring system tracking events per user to detect anomalies during bi-weekly version rollouts across both iOS and Android devices, preventing 2 major iOS app version issues before global dial-up reached 5%
  • Built the HyperLogLog Sketch user events dataset compressing 12 billion user events per day to 200 million sketched records, enhancing query performance by more than 10x
    • Built the code library to create HyperLogLog Sketch aggregation in Spark compatible with Athena (Trino) HyperLogLog implementation, used by the whole organization to create other HyperLogLog Sketch datasets
  • Built LLM-optimized dataset with semantic layer for internal Text-to-SQL product
  • Created an AI/LLM Operational Excellence tool used by all Amazon Music employees across 60 teams to drive their operational excellence
  • Led the adoption of Iceberg tables at Amazon Music — built the first Iceberg tables which drove adoption to 10+ datasets across data infrastructure, migrating teams from Hive to Iceberg and still expanding
2017 – 2022
Sr. Data Scientist
Oracle Cloud
  • Developed and operated a machine learning service for Oracle's Audience ranking products, improving algorithm performance by an average of 13% across multiple quarters, resulting in better benchmarks and more revenue
  • Built the feature engineering system for ad targeting of 200 million+ US individual profiles, processing over 1 billion credit card transactions daily
  • Saved the company ~$400K by migrating the demographics data system used for machine learning to a system built on open source libraries
  • Collaborated with cross-functional teams to integrate measurement products, driving data-driven decision-making
2016
R&D Intern
Tesla
  • Vehicle diagnostics for Model S and X service
2015 – 2017
Master of Science
University of Washington
Big Data, Machine Learning, Data Science
2011 – 2015
Bachelor of Engineering
BMS College of Engineering
Signal Processing, Computer Networking, Electronics

Live Demos

shannongenerator.com

Synthetic data generator with scenario simulations across business models.

letsfindparking.com

AI agent for finding street parking in SF using real time data.