version: 2 models: - name: fct_account_revenue description: "Daily ARR rollup by account" config: materialized: incremental unique_key: account_date_id on_schema_change: append_new_columns contract: {enforced: true} columns: - name: account_id data_type: string tests: [unique, not_null] - name: arr_usd data_type: numeric(12,2) tests: - not_null - dbt_utils.accepted_range: min_value: 0
Modeling the
messy middle
of the data stack.
Analytics engineer. Six years across fintech ($1T+ AUM platform) and B2B SaaS. I build the dbt models, Looker semantic layers, and GTM data infrastructure that teams actually trust — and the next analyst can read.
-- account health tier, refreshed hourly WITH active_accounts AS ( SELECT account_id, arr_usd, days_idle, open_opps FROM {{ ref('int_accounts__activity') }} WHERE is_active ) SELECT account_id, arr_usd, CASE WHEN days_idle > 60 THEN 'at_risk' WHEN days_idle BETWEEN 30 AND 60 THEN 'attention' ELSE 'healthy' END AS health_tier FROM active_accounts
| account_id | arr_usd | health_tier |
|---|---|---|
| acc_018b | $148,000 | healthy |
| acc_022c | $96,500 | attention |
| acc_041a | $212,400 | at_risk |
| acc_055d | $74,200 | healthy |
I build the layer where business questions meet warehouse reality.
I'm Xander. I model data — staging, intermediate, marts — and the seam where stakeholders meet the warehouse. I worked at Addepar for six years (a $1T+ AUM wealth platform), growing from operations analyst into the team that owned LookML, Snowflake, and the daily delivery of trusted data for the firm's product, R&D, and ops orgs.
Then a stint at Dorsia leading their analytics platform re-architecture from GCP to AWS — new dbt project, new semantic layer, 90% faster analytics SLA. Currently at Pave, building GTM data infrastructure for the compensation management platform that powers comp planning across 10,000+ companies.
The work I keep coming back to: turning Salesforce, product event, and warehouse data into models that don't break, that pass their tests, and that the next analyst can read at 2 AM with a dashboard on fire.
The artifacts of the work.
A few of the things I write, deploy, and reach for on a normal day. Not screenshots — actual specimens. The kind of files that live in /models, /views, and the dbt docs site.
view: account_health { sql_table_name: marts.fct_account_health ;; dimension: account_id { primary_key: yes sql: ${TABLE}.account_id ;; } dimension: health_tier { type: string sql: ${TABLE}.health_tier ;; } measure: total_arr { type: sum sql: ${TABLE}.arr_usd ;; value_format_name: usd_0 } measure: at_risk_arr { type: sum filters: [health_tier: "at_risk"] sql: ${TABLE}.arr_usd ;; } }
metrics: - name: stalled_opp_count label: "Stalled Opportunities" description: "Opps idle 30d+" type: simple type_params: measure: opportunity_count filter: | {{ Dimension( 'opportunity__last_activity_days' ) }} > 30 AND {{ Dimension( 'opportunity__stage' ) }} NOT IN ( 'closed_won', 'closed_lost' )
2018-01-01 forward. Refreshes hourly via incremental run keyed on account_date_id.| name | type | tests |
|---|---|---|
| account_date_id | string | unique, not_null |
| account_id | string | not_null, → accounts |
| date_day | date | not_null |
| arr_usd | numeric | not_null, ≥ 0 |
| _loaded_at | timestamp | not_null |
| industry | total_arr | at_risk |
|---|---|---|
| SaaS | $4.2M | 12 |
| Fintech | $2.8M | 7 |
| Healthcare | $1.4M | 4 |
SELECT industry, COUNT(DISTINCT account_id) AS account_count, SUM(arr_usd) AS total_arr, SUM(CASE WHEN health_tier = 'at_risk' THEN arr_usd ELSE 0 END) AS at_risk_arr FROM {{ ref('fct_account_health') }} WHERE industry = {{ industry_filter }} AND arr_usd >= {{ min_arr }} GROUP BY 1 ORDER BY total_arr DESC
| industry | accounts | total_arr | at_risk_arr |
|---|---|---|---|
| SaaS | 24 | $4,212,000 | $580,400 |
| Fintech | 17 | $2,840,500 | $310,200 |
| Healthcare | 11 | $1,440,000 | $180,000 |
| Retail | 9 | $890,300 | $96,500 |
The pipeline I reach for.
From source extracts to a metric on someone's dashboard, this is the chain I work in most days. Tools listed are ones I've shipped work with — not collected.
Applied ML
AI-augmented analytics
GTM & CRM
Building in public.
Three projects shipping over the next several weeks. Each is a public dbt repo, a write-up, and an artifact you can actually poke at — designed to demonstrate analytics engineering at the messy middle: schema design, dbt model architecture, applied ML for business outcomes, and LLM-augmented analytics workflows.
GTM Semantic Layer
A complete dbt project modeling a synthetic CRM dataset — accounts, opportunities, contacts, activities — into a sales-funnel mart with account-health scoring. Staging → intermediate → marts architecture, 60+ models, custom generic tests for referential integrity and business rules, dbt docs deployed to GitHub Pages. The semantic layer renders out to Looker explores.
Marketing Attribution & LTV
A predictive-modeling layer on top of public e-commerce data (Brazilian Olist). dbt customer-journey models feed a Python BG/NBD + Gamma-Gamma LTV model and a multi-touch attribution analysis. End-to-end: warehouse modeling, feature engineering, lifetimes / scikit-learn modeling, surfaced through a small interactive app.
LLM-Augmented Analytics Workflow
An AI tool that knows your semantic layer. RAG over warehouse metadata, dbt exposures, and column-level lineage to answer natural-language questions with semantic-layer-grounded SQL — and flag when the question can't be answered cleanly by the existing model. The version of "talk to your data" that doesn't hallucinate metrics.
SELECT d.industry, SUM(r.arr_usd) AS q3_arr FROM {{ ref('fct_account_revenue') }} r JOIN {{ ref('dim_accounts') }} d USING (account_id) WHERE r.fiscal_quarter = '2026Q3' GROUP BY 1 ORDER BY 2 DESC LIMIT 10
Open to analytics engineering work.
Hiring AE — especially GTM, marketing, or product analytics — or want to argue about dbt patterns and semantic-layer design? The door is open. Quickest reply via email; the others all forward to my phone.