Nazarii Melnychuk

Lead Data & Applied AI Engineer

Updated June 2026

  • Lead Data Engineer and Applied AI Engineer specializing in data-platform modernization, lakehouse architecture, and production agentic (LLM) systems
  • 10 years of experience across distributed systems, large-scale data pipelines, and applied AI
  • Data Lead for 6 data engineers; grew the team from 2 to 6 through hands-on hiring, onboarding, and development
  • Also lead a 9-person cross-functional delivery team, keeping technical direction aligned with business goals
  • Generalist engineer: primary language Python; reads Scala, Go, and TypeScript
  • Driven by architecture, performance, business correctness, and design elegance
  • Data-platform modernization and lakehouse architecture (Databricks, Delta Lake, medallion pipelines)
  • Large-scale batch and streaming data processing with Apache Spark
  • Production applied-AI and agentic systems: LLM services, agent frameworks, MCP tools, embeddings, and vector search
  • Scalable pipelines and services in Python on AWS, with a focus on performance and business correctness
  • Serve as Data Lead for 6 data engineers, growing the team from 2 to 6 across sourcing, interviewing, technical assessment, hiring, onboarding, and development
  • Lead a 9-person cross-functional delivery team (including 4 data engineers), coordinating technical direction, priorities, dependencies, and execution
  • Own a multi-quarter data-platform roadmap, translating business goals into architecture decisions, implementation plans, and launch criteria
  • Navigate complex organizational change and shifting priorities while preserving delivery reliability and measurable business outcomes
  • Technical interviewer; previously ran Scala bootcamp lectures and internal workshops to upskill mid-to-senior engineers
  • Cloud communications (WhatsApp, RCS, SMS)
  • Marketing/Adtech (user engagement data, ad analytics)
  • Healthcare data processing and HIPAA compliance
  • IoT monitoring for food/beverage companies
  • Energy sector digital transformation

Applied AI & automation

  • Built and launched an LLM-powered natural-language data discovery service from zero to production (FastAPI, agent frameworks, MCP, embeddings, and vector search).
  • Designed and productionized a Claude-powered autonomous incident-triage agent with permission-gated, read-only investigations and an auditable activity trail.
  • Automated data-team and sprint/planning processes as reusable agent skills, codifying runbooks into workflows with confirmation steps and invariant checks.

Communications project

  • Developed in-house services replacing costly external APIs, generating $100k-$250k customer savings
  • Optimized message routing and billing pipeline during Kinesis-to-Kafka migration
  • Achieved improved throughput (830+ MPS) by eliminating processing redundancies
  • Implemented comprehensive cross-stream validation to ensure data consistency between source systems during this critical transition

Healthcare project

  • Transformed legacy Python batch jobs into scalable Apache Spark workflows, successfully migrating complex healthcare data processing logic
  • Contributed significant system architecture decisions across multiple projects to build production-grade ETL pipelines handling diverse healthcare data formats

Languages

Python (primary)ScalaGoTypeScript

Data & Lakehouse

DatabricksDelta LakeApache SparkApache KafkaAirflow

Applied AI

LLMsClaude / Claude CodeMCPFastMCPEmbeddingsVector searchFastAPI

Agent SDKs

OpenAI Agents SDKClaude Agent SDKInngest AgentKit

Data Warehousing & Analytics

Amazon RedshiftBigQueryAmazon AthenaElasticSearchpandas

Databases

PostgreSQLAmazon RDSElastiCache: Redis / Valkey

Cloud & Infrastructure

AWS (EMR, Lambda, SQS, Aurora)DockerK8SDatadog

Data Platform Lead, Lakehouse & AI

Jan 2025 – Present

Customer: US adtech company

Lead the data platform and applied-AI initiatives for a US adtech company, owning lakehouse modernization, production reliability, and agentic AI systems while leading data engineering delivery.

Responsibilities:

  • Spearheaded the end-to-end migration of a business-critical analytics platform from a legacy cloud warehouse and third-party analytical database to a Databricks Lakehouse, establishing a unified serving layer and removing vendor dependencies from the critical data path.
  • Architected medallion-style data pipelines using Spark and Delta Lake: incremental ingestion, snapshot-consistent merges, granular fact models, pre-aggregations, and probabilistic counting for large-scale analytical workloads.
  • Led controlled cutovers across data pipelines, backend APIs, frontend clients, and downstream services using automated source-versus-target parity testing and stable interface contracts; retired obsolete infrastructure and removed thousands of lines of legacy code.
  • Delivered substantial efficiency improvements: 4–5× faster production processing, 100×+ acceleration in validated analytical prototypes, order-of-magnitude corrections to sizing logic, and recurring cloud-cost reductions.
  • Restored correctness in complex measurement and attribution pipelines by resolving temporal-window, slowly-changing-dimension, canonicalization, deduplication, status-management, and aggregation defects.
  • Led root-cause analysis and remediation of high-impact production incidents spanning schema inference, Unicode normalization, database connection lifecycle, distributed storage configuration, cache semantics, and accidental Cartesian joins.
  • Built and launched an LLM-powered natural-language data discovery service from zero to production using FastAPI, agent frameworks, MCP tools, embeddings, and vector search over tens of thousands of domain attributes.
  • Designed and productionized a Claude-powered autonomous incident-triage agent that performs permission-gated, read-only investigations, maintains an auditable activity trail, posts structured incident summaries, and opens draft documentation pull requests.
  • Led the architecture (not implementation) of a resilient, self-learning agent-based LLM parser for adtech postlogs: scoped the stack, intermediate-result storage, off-the-shelf vs hosted OCR / bounding-box models (e.g., Datalab), clear responsibility separation from other agents in the flow, and an assessment of current SOTA approaches.
  • Hardened agentic systems for production with structured-output validation, permission boundaries, bounded concurrency, token controls, observability, secrets management, container orchestration, and Git-based episodic memory.
  • Applied Claude and Claude Code as engineering force multipliers for repository-scale analysis, migration planning, debugging, implementation, and documentation, backed by automated tests, data-parity checks, and human review.
  • Automated several data-team and sprint/planning processes as reusable agent skills, converting runbooks into codified workflows with built-in confirmation steps and invariant checks.

Messaging Project

Oct 2021 – Dec 2024

Customer: US SaaS company

Developed core components of omnichannel messaging platform, enabling enterprise customers to reach users across WhatsApp Business, RCS, and SMS channels through unified APIs.

Responsibilities:

  • Implemented key parts of sender registration system supporting multiple messaging channels (WhatsApp, RCS and potentially others), handling provider-specific requirements and compliance rules.
  • Optimized message delivery latency and reliability while maintaining complex business rules across different OTT providers.
  • Resolved critical customer escalations through deep technical investigation and edge case analysis.
  • Worked on updating billing logic on a critical path, ensuring accurate and timely billing for customers.
  • Contributed to cross-team Scala, Golang and Java projects.

Marketing Project

Mar 2021 – Oct 2021

Customer: US technology company

Developed and optimized Apache Spark pipelines processing cross-service engagement data with strict privacy preservation requirements across multiple digital entertainment and subscription platforms.

Responsibilities:

  • Engineered high-performance Spark jobs processing TB-scale user engagement data.
  • Built privacy-preserving data aggregation pipelines enabling anonymous cross-service analytics.
  • Optimized data processing pipelines reducing job completion times while ensuring data minimization principles.
  • Documented dataset lineage and data flow for compliance and reproducibility.

Healthcare Project

May 2020 – Mar 2021

Customer: US healthcare technology company

Played key role in modernizing healthcare data processing platform, enabling efficient transformation of diverse medical records into analytics-ready formats for business intelligence.

Responsibilities:

  • Transformed legacy Python batch jobs into scalable Apache Spark workflows, migrating complex healthcare data processing logic.
  • Designed and implemented production-grade ETL pipelines handling diverse healthcare data formats from multiple source systems.
  • Optimized large-scale data reprocessing jobs reducing execution time while ensuring HIPAA compliance.
  • Developed automated AWS S3 to Redshift data pipeline using PySpark and boto3, enabling real-time BI reporting.
  • Contributed significant technical input to architecture decisions affecting multiple project initiatives.

Adtech Project

Oct 2019 – May 2020

Customer: Israel adtech company

Developed real-time ad analytics platform processing high-volume impression data (670+ MPS) to enable automated bidding decisions and campaign optimization.

Responsibilities:

  • Implemented Spark Structured Streaming jobs for real-time ad performance analysis
  • Optimized data processing architecture reducing operational costs while maintaining system reliability
  • Modernized legacy Python ETL pipelines to improve maintainability and processing efficiency

Communications Project

Feb 2018 – Oct 2019

Customer: US SaaS company

Developed high-throughput streaming applications for SMS data processing, implementing complex rate calculation logic and generating business intelligence insights.

Responsibilities:

  • Optimized messaging metadata enrichment pipeline during Kinesis-to-Kafka migration, achieving 830+ MPS through elimination of processing redundancies.
  • Implemented comprehensive cross-stream validation to ensure data consistency between source systems during the transition.
  • Built cost-effective internal services replacing external API dependencies, resulting in significant operational savings
  • Led Scala knowledge-sharing initiatives including mentoring sessions and technical workshops

Digital Transformation Project

Jun 2017 – Dec 2017

Customer: US energy company

Contributed to enterprise-wide data modernization initiative, migrating traditional database workloads to cloud-based big data processing platform.

Responsibilities:

  • Replaced legacy Oracle and MySQL batch processes with scalable Apache Spark pipelines
  • Built new data processing workflows using Spark SQL and Hive, eliminating dependency on legacy SQL jobs

IoT Monitoring Platform

Nov 2015 – Jun 2017

Customers: European food and beverage companies

Developed real-time monitoring system processing data from IoT sensors across warehouse and retail locations, enabling predictive maintenance and inventory optimization.

Responsibilities:

  • Implemented scalable data processing pipelines using Apache Spark and Akka Streams to handle real-time sensor data
  • Built diagnostic tools enabling QA and hardware teams to validate device performance and data accuracy
  • Developed anomaly detection system for early identification of hardware issues

MSc Degree, Computer Science, Lviv Polytechnic National University, Ukraine

2016 – 2017

Natural Language Processing & AI

2023
  • Contributed Ukrainian localization and dataset improvements to OpenAssistant/oasst1 open-source project
  • Built practical applications using transformer models and HuggingFace libraries
  • Explored large language models and their enterprise applications

Deep Learning & Time Series Analysis

2022
  • Completed fast.ai's Practical Deep Learning for Coders course
  • Implemented time-series forecasting solutions using Facebook's Prophet
  • Applied deep learning techniques to real-world data problems

I enjoy designing practical, custom-built apps with Claude and OpenAI Codex:

  • A real-time FTMS logging and workout tracker for my air bike, with heart-rate and cadence support and data export.
  • A high-performance 180° and 360° panorama and stereo-image viewer for the Meta Quest VR headset, with mipmap support and low latency.
  • Automation of personal workflows with a Nous Hermes agent, through custom-built skills and purpose-built mini-apps.

English - Advanced