intro about field notes stack projects contact
Open to AE roles · Salt Lake City / Remote

Modeling the
messy middle
of the data stack.

Analytics engineer. Six years across fintech ($1T+ AUM platform) and B2B SaaS. I build the dbt models, Looker semantic layers, and GTM data infrastructure that teams actually trust — and the next analyst can read.

~/gtm/models/marts/account_health.sql
14 tests · 4ms
-- account health tier, refreshed hourly
WITH active_accounts AS (
  SELECT
    account_id,
    arr_usd,
    days_idle,
    open_opps
  FROM {{ ref('int_accounts__activity') }}
  WHERE is_active
)
SELECT
  account_id,
  arr_usd,
  CASE
    WHEN days_idle > 60 THEN 'at_risk'
    WHEN days_idle BETWEEN 30 AND 60 THEN 'attention'
    ELSE 'healthy'
  END AS health_tier
FROM active_accounts
→ output · 4 rows0.04s
account_idarr_usdhealth_tier
acc_018b$148,000healthy
acc_022c$96,500attention
acc_041a$212,400at_risk
acc_055d$74,200healthy
accounts
account_idstringPK · not null
namestringnot null
arr_usdnumeric≥ 0
created_attimestamp

activities
activity_idstringPK
account_idstringFK → accounts
kindenumcall,email,meeting
occurred_attimestampindexed

opportunities
opp_idstringPK
account_idstringFK → accounts
stageenum7 values
amount_usdnumeric≥ 0
salesforce.accounts salesforce.activities salesforce.opportunities SOURCES stg_accounts stg_activities stg_opportunities STAGING int_accounts__activity int_pipeline_metrics INTERMEDIATE account_health MARTS
unique_account_health_account_id · account_id
0.02s
not_null_account_health_account_id · account_id
0.01s
accepted_values_health_tier · healthy, attention, at_risk
0.02s
relationships_account_id → accounts.account_id
0.03s
positive_arr_usd · arr_usd ≥ 0
0.01s
freshness_within_2h · refreshed 47m ago
0.01s
[01]about

I build the layer where business questions meet warehouse reality.

I'm Xander. I model data — staging, intermediate, marts — and the seam where stakeholders meet the warehouse. I worked at Addepar for six years (a $1T+ AUM wealth platform), growing from operations analyst into the team that owned LookML, Snowflake, and the daily delivery of trusted data for the firm's product, R&D, and ops orgs.

Then a stint at Dorsia leading their analytics platform re-architecture from GCP to AWS — new dbt project, new semantic layer, 90% faster analytics SLA. Currently at Pave, building GTM data infrastructure for the compensation management platform that powers comp planning across 10,000+ companies.

The work I keep coming back to: turning Salesforce, product event, and warehouse data into models that don't break, that pass their tests, and that the next analyst can read at 2 AM with a dashboard on fire.

Analytics engineering isn't writing more SQL. It's building the layer where one good model retires fifty bad reports.
career_metricslive
0+ yrs
In analytics roles
2 promotions at Addepar
$1T+
AUM data delivered
daily reconciliation
0+
Accounts at Pave
$1.3M+ ARR owned
0%
SLA reduction at Dorsia
via self-serve marts
tenure / role progression2018 → 2026
'18'20'22'24'26 Ops Analyst · Addepar Senior Data Analyst · Addepar Dorsia Pave →
[02]field notes

The artifacts of the work.

A few of the things I write, deploy, and reach for on a normal day. Not screenshots — actual specimens. The kind of files that live in /models, /views, and the dbt docs site.

models/marts/_marts.yml dbt contract
version: 2
models:
  - name: fct_account_revenue
    description: "Daily ARR rollup by account"
    config:
      materialized: incremental
      unique_key: account_date_id
      on_schema_change: append_new_columns
    contract: {enforced: true}
    columns:
      - name: account_id
        data_type: string
        tests: [unique, not_null]
      - name: arr_usd
        data_type: numeric(12,2)
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
views/account_health.view.lkml LookML
view: account_health {
  sql_table_name: marts.fct_account_health ;;

  dimension: account_id {
    primary_key: yes
    sql: ${TABLE}.account_id ;;
  }

  dimension: health_tier {
    type: string
    sql: ${TABLE}.health_tier ;;
  }

  measure: total_arr {
    type: sum
    sql: ${TABLE}.arr_usd ;;
    value_format_name: usd_0
  }

  measure: at_risk_arr {
    type: sum
    filters: [health_tier: "at_risk"]
    sql: ${TABLE}.arr_usd ;;
  }
}
models/semantic/gtm_metrics.yml semantic layer
metrics:
  - name: stalled_opp_count
    label: "Stalled Opportunities"
    description: "Opps idle 30d+"
    type: simple
    type_params:
      measure: opportunity_count
    filter: |
      {{ Dimension(
        'opportunity__last_activity_days'
      ) }} > 30
      AND {{ Dimension(
        'opportunity__stage'
      ) }} NOT IN (
        'closed_won',
        'closed_lost'
      )
_freshness · live monitor observability
sourcelast loadedstatus
salesforce.accounts12m agofresh
salesforce.opportunities8m agofresh
salesforce.activities14m agofresh
stripe.invoices47m agowarn
gong.calls2h 12mwarn
hubspot.contacts4h 03mstale
dbt run · last 24h performance
stg_accounts 0.3s stg_opportunities 0.4s stg_activities 0.5s int_accounts__activity 2.1s int_pipeline_metrics 1.4s fct_account_revenue 3.8s fct_account_health 0.9s fct_pipeline_velocity 1.2s 0s1s 2s3s4s fct_account_revenue ↑ 18% slower vs 7d avg
exposures · downstream governance
fct_account_ revenue Sales Pipeline Looker dashboard GTM Slack Bot internal · py Salesforce Sync reverse ETL 3 14 tests passing 3 exposures ∼47m freshness
marts/fct_account_revenue · dbt docs model documentation
fct_account_revenue
analytics.marts.fct_account_revenue
incremental 14 tests · passing
description
Daily account revenue rollup. Backfills from 2018-01-01 forward. Refreshes hourly via incremental run keyed on account_date_id.
materialization
materializedincremental
unique_keyaccount_date_id
schemaanalytics_marts
on_schema_changeappend_new
columns · 5
nametypetests
account_date_idstringunique, not_null
account_idstringnot_null, → accounts
date_daydatenot_null
arr_usdnumericnot_null, ≥ 0
_loaded_attimestampnot_null
lineage
3 upstream fct_account_revenue 2 downstream
Looker · Explore: Account Health BI workspace
VIEWS
account_health
dimensions
· industry
· health_tier
· account_id
· created_at
measures
Σ total_arr
# at_risk_count
# account_count
selected fields · 4
industry dimension
health_tier dimension
Σtotal_arr sum · usd_0
#at_risk_count count
filters
health_tier is at_risk
result · 3 rows
industrytotal_arrat_risk
SaaS$4.2M12
Fintech$2.8M7
Healthcare$1.4M4
column-level lineage · fct_pipeline_velocity advanced lineage
SOURCE STAGING MART salesforce.opportunity amount decimal stage string stage_entered_at timestamp close_date date salesforce.account industry string employee_count integer stg_opportunities amount_usd numeric stage_clean enum stage_entered_at timestamp close_date date stg_accounts industry string size_bucket enum fct_pipeline_velocity amount_usd numeric stage enum days_in_stage derived stage_velocity derived close_date date industry_segment derived + DATEDIFF + velocity calc + segment join 1:1 mapping derived / transformed staging → mart
PR #142 · feat: add stage_velocity derived field code review
#142 feat: add stage_velocity derived field
merged · 2h ago
2 files changed +28 lines −4 lines · 14 checks passed
models/marts/fct_pipeline_velocity.sql +14 −2
1212WITH ranked_opps AS (
1313 SELECT
1414 opp_id,
1515 account_id,
1616 stage,
17 stage_entered_at,
17 stage_entered_at,
18 DATEDIFF('day', stage_entered_at,
19 CURRENT_DATE) AS days_in_stage,
20 amount_usd / NULLIF(days_in_stage, 0)
21 AS stage_velocity,
1822 amount_usd
1923 FROM {{ ref('stg_opportunities') }}
models/marts/_marts.yml +14 −2
2828 columns:
2929 - name: stage
30 - name: days_in_stage
31 tests:
32 - not_null
33 - dbt_utils.accepted_range:
34 min_value: 0
35 - name: stage_velocity
36 tests: [not_null]
Hex · GTM Health Monitor.hex notebook · live app
CELLS · 5
1 in industry_filter
2 sql query_health
3 tbl df_health
4 chart arr_by_industry
5 md narrative
all cells synced · 0.04s
[1] INPUT industry_filter ·
Industry SaaS Min ARR $50k
[2] SQL df_health → snowflake.analytics ran 1.2s
SELECT
  industry,
  COUNT(DISTINCT account_id) AS account_count,
  SUM(arr_usd) AS total_arr,
  SUM(CASE WHEN health_tier = 'at_risk'
       THEN arr_usd ELSE 0 END) AS at_risk_arr
FROM {{ ref('fct_account_health') }}
WHERE industry = {{ industry_filter }}
  AND arr_usd >= {{ min_arr }}
GROUP BY 1
ORDER BY total_arr DESC
[3] TABLE df_health 4 rows · 1 col selected
industryaccountstotal_arrat_risk_arr
SaaS24$4,212,000$580,400
Fintech17$2,840,500$310,200
Healthcare11$1,440,000$180,000
Retail9$890,300$96,500
[4] CHART arr_by_industry · bar ·
SaaS Fintech Healthcare Retail $4.2M $2.84M $1.44M $890k $0 $1M $2M $3M $4M total_arr at_risk
[03]stack

The pipeline I reach for.

From source extracts to a metric on someone's dashboard, this is the chain I work in most days. Tools listed are ones I've shipped work with — not collected.

→ 01
Ingest
FivetranREST APIsSFTPWebhooksPython
→ 02
Warehouse
SnowflakeBigQueryRedshift
→ 03
Model
dbtSQLDimensional modelingSCD Type 2dbt tests
→ 04
Serve
Looker / LookMLLightdashHexMode

Applied ML

Python·scikit-learn·lifetimes·LTV·lead scoring

AI-augmented analytics

OpenAI·Anthropic·RAG·pgvector·LLM-SQL

GTM & CRM

Salesforce·Gong·Outreach·Hightouch
[04]selected projects

Building in public.

Three projects shipping over the next several weeks. Each is a public dbt repo, a write-up, and an artifact you can actually poke at — designed to demonstrate analytics engineering at the messy middle: schema design, dbt model architecture, applied ML for business outcomes, and LLM-augmented analytics workflows.

project_01

GTM Semantic Layer

In progress

A complete dbt project modeling a synthetic CRM dataset — accounts, opportunities, contacts, activities — into a sales-funnel mart with account-health scoring. Staging → intermediate → marts architecture, 60+ models, custom generic tests for referential integrity and business rules, dbt docs deployed to GitHub Pages. The semantic layer renders out to Looker explores.

dbtBigQueryLookerdimensional modelingGitHub Actions
dbt lineage60 models
SRC STG INT MART
fct_pipeline_velocity12 cols
opp_idstringPK
account_idstringFK
stageenum
amount_usdnumeric
days_in_stageint
stage_velocitynumeric
is_stalledboolderived
close_datedate
created_attimestamp
updated_attimestamp
dbt_valid_fromtimestampSCD2
dbt_valid_totimestampSCD2
dbt test results87 pass · 0 fail
unique_opp_id · fct_pipeline_velocity
0.04s
not_null_account_id · all marts
0.02s
relationships_opp__account → dim_accounts
0.05s
accepted_values_stage · 7 enum values
0.02s
no_orphans__activities · custom generic
0.06s
freshness__salesforce · < 2h
0.01s
project_02

Marketing Attribution & LTV

Planning

A predictive-modeling layer on top of public e-commerce data (Brazilian Olist). dbt customer-journey models feed a Python BG/NBD + Gamma-Gamma LTV model and a multi-touch attribution analysis. End-to-end: warehouse modeling, feature engineering, lifetimes / scikit-learn modeling, surfaced through a small interactive app.

dbtPythonscikit-learnlifetimesattribution
predicted LTVBG/NBD
$0 $40 $80 $120 M1 M3 M6 M9 M12 observed predicted
cohort retention12 mo
Jan Feb Mar Apr May Jun Jul M0M1 M2M3 M4M5 M6 retention → low → high
attribution mixmulti-touch
Paid Search Direct Organic Email Referral Social $48k $39k $36k $24k $18k $12k first mid last
project_03

LLM-Augmented Analytics Workflow

Planning

An AI tool that knows your semantic layer. RAG over warehouse metadata, dbt exposures, and column-level lineage to answer natural-language questions with semantic-layer-grounded SQL — and flag when the question can't be answered cleanly by the existing model. The version of "talk to your data" that doesn't hallucinate metrics.

PythonOpenAIpgvectorRAGdbt metadata
conversationRAG
Q3 ARR for top-10 accounts by industry?
Pulling marts.fct_account_revenue joined to dim_accounts. Grouped by industry, ranked by Q3 ARR, top 10. ↳ exposure: ae_account_revenue · 14 tests passing
Only ones with a stalled opp.
Filtering opportunities.last_activity_at < now() − 30d and stage NOT IN (closed_won, closed_lost). ↳ semantic: stalled_opp_count (dbt metric)
retrieval · top-kcos sim
marts.fct_account_revenue mart0.94
marts.dim_accounts dim0.91
metrics.stalled_opp_count metric0.88
marts.fct_opportunities mart0.84
exposures.gtm_dashboard exposure0.79
int_accounts__activity int0.71
generated SQLcompiled
grounded from
marts.fct_account_revenue · dim_accounts
generated query
SELECT
  d.industry,
  SUM(r.arr_usd) AS q3_arr
FROM {{ ref('fct_account_revenue') }} r
JOIN {{ ref('dim_accounts') }} d
  USING (account_id)
WHERE r.fiscal_quarter = '2026Q3'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
[05]contact

Open to analytics engineering work.

Hiring AE — especially GTM, marketing, or product analytics — or want to argue about dbt patterns and semantic-layer design? The door is open. Quickest reply via email; the others all forward to my phone.