📄Download CV
Open to new opportunities

Vinh X. Nguyen

AI Data Engineer Leader
📧 nxv.can@gmail.com 📞 (+1) 548.994.4264 📍 Waterloo, Ontario, Canada

I build big data platforms — from infrastructure to semantic layers — and have spent the last few years exploring how LLM-powered multi-agent systems can make analytics faster and more honest. Mostly working with AWS, Snowflake, Databricks, and Python.

~9 yrs· 7 sectors· 200B+ events· 99.97% reconciliation accuracy
🏦 Banking 💰 Insurance 🚗 Automotive 📊 Analytics / KPMG 🔐 PKI · Security 🛍 Retail / PoS 🎓 Education
$1.5B
transactions audited by 6-agent LangGraph (MADA)
200B+
events / 30TB text processed on AWS · Snowflake · Spark
1M+
mobile-banking users protected by Cert Pinning
99.97%
financial reconciliation accuracy (from 60%)

About Me

Modeling: OLTP, OLAP, Star schema, Schema-on-read, Delta Lakes, SCD, Data Vault 1.0 & 2.0, Medallion.
Engineering: Led Big Data batch/streaming/ETL at 200B+ events, 30TB text on AWS + Snowflake + Spark + Databricks.
Agentic LLM: Accelerated time-to-insight with LLM-powered multi-agents using MCP, LangGraph, LangChain, and RAG analyzing $1.5B financial transactions.
Architecture: Architect Community Lead at TymeBank (GOTyme – 1M users); defined integration & data patterns across 5 engineering teams.
Cloud-First: Full-stack AWS (security, network, compute, messaging, analytics); Databricks; Snowflake.
Financial Impact: Detected multi-million-dollar revenue leakage. Improved financial reconciliation accuracy to 99.97%.
Mentorship: University lecturer & engineering mentor — ML, algorithms, blockchain, performance & cost optimization.
Sectors: Bank (GOTyme, UBS), Insurance (Manulife, Prudential), Automotive (Cox), Analytics (KPMG, Ryte).

Skills & Expertise

🤖 AI Agentic LLM

  • Multi-agent workflows (LangGraph, LangChain)
  • Model Context Protocol (MCP)
  • GPT-4, Claude, Qwen; RAG pipelines
  • Vector Databases, Dynamic Tool Calls
  • Financial Anomaly Detection Agents

🗄️ Data Modeling

  • OLTP / OLAP at 200B+ records
  • Star Schema, SCD Types
  • Data Vault 2.0
  • Medallion Architecture (500M+ events)
  • Schema-on-read, Delta Lakes

⚙️ Data Engineering

  • Spark (EMR/Glue), Databricks, Delta Lake
  • Kafka, Kinesis, SQS/SNS, DynamoDB
  • Batch, micro-batch & streaming pipelines
  • S3 + Glue Catalog, Snowflake, MySQL
  • AWS Lambda, Sagemaker, Jupyter

☁️ Cloud & Infrastructure

  • AWS: ECS, Kinesis, VPC, WAF, Route53
  • API Gateway, CloudFront, ELB, NAT
  • Snowflake: Snowpipe, Materialized Views
  • Databricks: Delta, Cost-optimization
  • IaC: Terraform, CloudFormation

📊 Analytics & Visualization

  • Tableau, PowerBI, QuickSight
  • Grafana, Datadog, CloudWatch, ELK
  • NLP: TF-IDF, Lemmatization, POS tagging
  • 30TB text processing, 30K req/day
  • Google Analytics (2M/mo)

🔐 Security & Systems

  • PKI, Certificate Pinning (1M+ users)
  • Blockchain, Cryptography
  • Event-driven, Backpressure, Async
  • QoS, MPLS, Load Balancing
  • WAF, VPC Peering, IAM

Projects at a Glance

Twelve flagship projects across five companies — dive into a focused write-up per project under Recent Work or read the long-form story in Posts.

mindmap
  root((12 Flagship Projects))
    FPT Canada
      P1 Click-stream Web Analytics
        200B+ Omni-Channel Ingestion
        10M-Entity MDS
        Self-Serve Data Platform
        Omni-Channel Dashboard Rebuild
      P2 Revenue-Leakage Reporting
        Receivables Payables Reconciliation
        Leakage Detection Alerting
        Executive Reporting Audit Trail
      P3 MADA Multi-Agent Analytics
        Self-Serve LLM Reporting
        Always-On Data Quality
        Self-Learning Rules Memory
        Harness Engineering
    Tyme GOTyme Bank
      P1 Banking Business Products
        Personal Lending
        VAS
        GoalSave Kicker
        ID Payment
      P2 Infrastructure
        Cert Pinning 1M users
        Kiosk Network Optimization
        AWS Infra POC
      P3 Analytics
        500M Tx Reporting Pipeline
        Multi-Source Analytics Tooling
    NFQ Asia
      P1 SEO Keyword Analytics
        NLP at 30TB
        Serverless 50K req day
        Datadog Cost Audit
      P2 PoS Serverless API
        PoS Device API Layer
        Event-Driven Pipeline
        Auto-Scaling Cost Control
    NashTech Global
      P1 Banking Insurance KYC
      P2 Process and People
    FPT Software
      P1 DirecTV Multi-Vendor Delivery
      P2 Cebu Dev Center and Training
      

Recent Work

Selected projects from the last few years. Each card opens a deep-link write-up — share the URL and it lands directly on the case study.

Featured case study
FPT Canada2024–Now

🤖 MADA · Multi-Agent Data Analytics

6-agent LangGraph pipeline (Orchestrator, Ingest Auditor, Anomaly Detector, Evidence Retriever, Evaluator) auditing $1.5B in financial transactions. Days → minutes time-to-insight, with citations, retry loops, and a self-learning rule memory.

LangGraphRAGMCPAWSSnowflake
Read the case study →
FPT Canada2022–Now

📊 200B+ Event ETL Platform

Bronze→Silver→Gold medallion on AWS + Snowflake + Spark, 99.97% enrichment quality, query latency 30s → 8s, used by 7+ DS / Analytics / Finance teams.

AWSSnowflakeSparkDelta Lake
Write-up coming soon
FPT Canada2023

💰 Revenue-Leakage Auto-Reporting

Lambda + Snowflake + DynamoDB pipeline; SES routing to owners; promoted to Data Integrity Lead. End-of-month → next-day; detection 30% → 99.99%.

LambdaSnowflakeSESTableau
Write-up coming soon
GOTyme Bank2019–2021

🔐 Cert Pinning · 1M+ Users

PKI + mobile SDK + zero-downtime rotation pipeline protecting 1M+ mobile-banking users against MitM. Standardized integration patterns across 5 dev teams.

PKIMobile SDKAWS EMRKinesis
Write-up coming soon
NFQ Asia2017–2019

🔍 SEO Keyword NLP at 30TB

TF-IDF + lemmatization + POS tagging pipeline; serverless API serving 20–50K req/day; latency 60s → 3s; resolved a $10K/mo Datadog cost issue.

NLPAWS LambdaDatadogServerless
Write-up coming soon
vinhnx.ca2026

🧪 This Site

Serverless résumé + portfolio: CloudFront + S3 + Lambda + DynamoDB, Terraform-managed. Includes a RAG chat widget, admin analytics, and a posts CMS.

TerraformLambdaRAGCloudFront
Write-up coming soon

Posts

Long-form notes on what I'm building, learning, and shipping. Filter by category below.

Loading…

Architecture & Journey

Career Timeline

From training engineers in Vietnam & the Philippines to leading AI data engineering in Canada.

timeline
    title 17+ Years in Engineering Leadership
    2009 : University Lecturer (DNTU / NLU)
    2014 : FPT Software - Training Manager / Solution Architect (PH and VN)
    2016 : NashTech Global - Principal Engineer / Architect
    2017 : NFQ Asia - Technical Architect (NLP at 30TB)
    2019 : Tyme / GOTyme Bank - Architect Community Lead (1M+ users)
    2021 : FPT Canada - Data Engineering Lead (200B+ events, LangGraph agents)
      

Skills Mind-Map

Six pillars across AI, data, cloud, and security.

mindmap
  root((Vinh X. Nguyen))
    AI Agentic LLM
      LangGraph / LangChain
      MCP
      RAG and Vector DBs
      Anomaly Detection
    Data Modeling
      OLTP / OLAP 200B+
      Star / SCD
      Data Vault 2.0
      Medallion
    Data Engineering
      Spark / Databricks
      Kafka / Kinesis
      Snowflake / Delta Lake
      Lambda / Sagemaker
    Cloud
      AWS Full-Stack
      Snowflake
      Databricks
      Terraform / CFN
    Analytics
      Tableau / PowerBI
      Datadog / Grafana
      NLP TF-IDF
    Security
      PKI / Cert Pinning
      Cryptography
      WAF / IAM
      Event-driven
      

Signature Architecture #1 — LangGraph Multi-Agent Anomaly Detection

FPT Canada — LLM agents auditing $1.5B in financial transactions.

flowchart LR
    Tx[(Transactions DW)] --> Orchestrator{LangGraph Orchestrator}
    Orchestrator --> Rules[Rules Agent]
    Orchestrator --> Stats[Statistical Agent]
    Orchestrator --> LLM[LLM Reasoning Agent]
    LLM --> RAG[(Policy RAG / Vector DB)]
    LLM --> MCP[MCP Tools to DBs]
    Rules --> Reviewer[Reviewer Agent]
    Stats --> Reviewer
    LLM --> Reviewer
    Reviewer --> Human[Human-in-the-Loop]
    Reviewer --> Cases[(Case Store)]
      

Signature Architecture #2 — 200B+ Event ETL Platform

FPT Canada — 99.97% enrichment quality across AWS, Snowflake, Databricks.

flowchart LR
    Sources[(Sources)] --> Stream[Kinesis / Kafka]
    Stream --> Bronze[S3 Bronze]
    Bronze --> Spark[Spark on EMR / Databricks]
    Spark --> Silver[S3 Silver / Delta]
    Silver --> Gold[S3 Gold / Delta]
    Gold --> SF[(Snowflake DW)]
    Gold --> Glue[Glue Catalog]
    SF --> BI[Tableau / PowerBI / QuickSight]
    Spark --> QA[Quality and Reconciliation]
    QA --> Alerts[CloudWatch / Datadog]
      

Signature Architecture #3 — Certificate Pinning for 1M+ Users

GOTyme Digital Bank — mitigating MitM at mobile-banking scale.

flowchart LR
    CA[Internal CA / PKI] --> Cert[Server Cert + Backup Pin]
    Cert --> SDK[Mobile SDK]
    SDK --> App[Banking App 1M+ users]
    App --> API[(Banking APIs)]
    Rotation[Rotation Pipeline] --> Cert
    App --> Telem[Telemetry / Crash Analytics]
    Telem --> Flags[Feature Flags / Staged Rollout]
      

Signature Architecture #4 — PoS Serverless API Integration

NFQ Asia — in-store PoS hardware integrated with auto-scaling AWS backend.

flowchart LR
    PoS[PoS Devices] --> APIGW[API Gateway]
    APIGW --> Lambda[AWS Lambda Edge API]
    Lambda --> SQS[SQS Queue]
    SQS --> Worker[Lambda Worker]
    Worker --> DDB[(DynamoDB)]
    Worker --> EB[EventBridge]
    EB --> DWH[(Warehouse / Analytics)]
    Lambda --> CW[CloudWatch / Alarms]
      

Education & Certifications

Master in Computer Science
Université Pierre et Marie Curie (Paris 6) — France
2011–2013
AWS Certified Data Analytics Specialty
AWS — Canada
2023
Statistics for Data Analysis
McMaster University — Ontario, Canada
2025
BSc in Computer Science
Nong Lam University — Vietnam
2025
Advanced Training for Banking Architect Leads
AWS Training Center — Vietnam
2020