Open to AE roles · Salt Lake City / Remote

Modeling the
messy middle
of the data stack.

Analytics engineer. Six years across fintech ($1T+ AUM platform) and B2B SaaS. I build the dbt models, Looker semantic layers, and GTM data infrastructure that teams actually trust — and the next analyst can read.

See what I'm building → Download résumé

~/gtm/models/marts/account_health.sql

14 tests · 4ms

-- account health tier, refreshed hourly
WITH active_accounts AS (
  SELECT
    account_id,
    arr_usd,
    days_idle,
    open_opps
  FROM {{ ref('int_accounts__activity') }}
  WHERE is_active
)
SELECT
  account_id,
  arr_usd,
  CASE
    WHEN days_idle > 60 THEN 'at_risk'
    WHEN days_idle BETWEEN 30 AND 60 THEN 'attention'
    ELSE 'healthy'
  END AS health_tier
FROM active_accounts

→ output · 4 rows0.04s

account_id	arr_usd	health_tier
acc_018b	$148,000	healthy
acc_022c	$96,500	attention
acc_041a	$212,400	at_risk
acc_055d	$74,200	healthy

accounts

account_idstringPK · not null

namestringnot null

arr_usdnumeric≥ 0

created_attimestamp

activities

activity_idstringPK

account_idstringFK → accounts

kindenumcall,email,meeting

occurred_attimestampindexed

opportunities

opp_idstringPK

account_idstringFK → accounts

stageenum7 values

amount_usdnumeric≥ 0

unique_account_health_account_id · account_id

0.02s

not_null_account_health_account_id · account_id

0.01s

accepted_values_health_tier · healthy, attention, at_risk

0.02s

relationships_account_id → accounts.account_id

0.03s

positive_arr_usd · arr_usd ≥ 0

0.01s

freshness_within_2h · refreshed 47m ago

0.01s

last compile · 3.2s refreshed 47s ago

[01]about

I build the layer where business questions meet warehouse reality.

I'm Xander. I model data — staging, intermediate, marts — and the seam where stakeholders meet the warehouse. I worked at Addepar for six years (a $1T+ AUM wealth platform), growing from operations analyst into the team that owned LookML, Snowflake, and the daily delivery of trusted data for the firm's product, R&D, and ops orgs.

Then a stint at Dorsia leading their analytics platform re-architecture from GCP to AWS — new dbt project, new semantic layer, 90% faster analytics SLA. Currently at Pave, building GTM data infrastructure for the compensation management platform that powers comp planning across 10,000+ companies.

The work I keep coming back to: turning Salesforce, product event, and warehouse data into models that don't break, that pass their tests, and that the next analyst can read at 2 AM with a dashboard on fire.

Analytics engineering isn't writing more SQL. It's building the layer where one good model retires fifty bad reports.

career_metricslive

0+ yrs

In analytics roles

2 promotions at Addepar

$1T+

AUM data delivered

daily reconciliation

Accounts at Pave

$1.3M+ ARR owned

SLA reduction at Dorsia

via self-serve marts

tenure / role progression2018 → 2026

[02]field notes

The artifacts of the work.

A few of the things I write, deploy, and reach for on a normal day. Not screenshots — actual specimens. The kind of files that live in /models, /views, and the dbt docs site.

models/marts/_marts.yml dbt contract

version: 2
models:
  - name: fct_account_revenue
    description: "Daily ARR rollup by account"
    config:
      materialized: incremental
      unique_key: account_date_id
      on_schema_change: append_new_columns
    contract: {enforced: true}
    columns:
      - name: account_id
        data_type: string
        tests: [unique, not_null]
      - name: arr_usd
        data_type: numeric(12,2)
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0

views/account_health.view.lkml LookML

view: account_health {
  sql_table_name: marts.fct_account_health ;;

  dimension: account_id {
    primary_key: yes
    sql: ${TABLE}.account_id ;;
  }

  dimension: health_tier {
    type: string
    sql: ${TABLE}.health_tier ;;
  }

  measure: total_arr {
    type: sum
    sql: ${TABLE}.arr_usd ;;
    value_format_name: usd_0
  }

  measure: at_risk_arr {
    type: sum
    filters: [health_tier: "at_risk"]
    sql: ${TABLE}.arr_usd ;;
  }
}

models/semantic/gtm_metrics.yml semantic layer

metrics:
  - name: stalled_opp_count
    label: "Stalled Opportunities"
    description: "Opps idle 30d+"
    type: simple
    type_params:
      measure: opportunity_count
    filter: |
      {{ Dimension(
        'opportunity__last_activity_days'
      ) }} > 30
      AND {{ Dimension(
        'opportunity__stage'
      ) }} NOT IN (
        'closed_won',
        'closed_lost'
      )

_freshness · live monitor observability

sourcelast loadedstatus

salesforce.accounts12m agofresh

salesforce.opportunities8m agofresh

salesforce.activities14m agofresh

stripe.invoices47m agowarn

gong.calls2h 12mwarn

hubspot.contacts4h 03mstale

dbt run · last 24h performance

exposures · downstream governance

marts/fct_account_revenue · dbt docs model documentation

fct_account_revenue

analytics.marts.fct_account_revenue

incremental 14 tests · passing

description

Daily account revenue rollup. Backfills from 2018-01-01 forward. Refreshes hourly via incremental run keyed on account_date_id.

materialization

materializedincremental

unique_keyaccount_date_id

schemaanalytics_marts

on_schema_changeappend_new

columns · 5

name	type	tests
account_date_id	string	unique, not_null
account_id	string	not_null, → accounts
date_day	date	not_null
arr_usd	numeric	not_null, ≥ 0
_loaded_at	timestamp	not_null

lineage

3 upstream → fct_account_revenue → 2 downstream

Looker · Explore: Account Health BI workspace

VIEWS

▾account_health

▾dimensions

· industry

· health_tier

· account_id

· created_at

▾measures

Σ total_arr

# at_risk_count

# account_count

selected fields · 4

▦industry dimension

▦health_tier dimension

Σtotal_arr sum · usd_0

#at_risk_count count

filters

health_tier is at_risk

result · 3 rows

industry	total_arr	at_risk
SaaS	$4.2M	12
Fintech	$2.8M	7
Healthcare	$1.4M	4

column-level lineage · fct_pipeline_velocity advanced lineage

PR #142 · feat: add stage_velocity derived field code review

#142 feat: add stage_velocity derived field

merged · 2h ago

2 files changed +28 lines −4 lines · 14 checks passed

models/marts/fct_pipeline_velocity.sql +14 −2

1212WITH ranked_opps AS (

1313 SELECT

1414 opp_id,

1515 account_id,

1616 stage,

17 stage_entered_at,

18 DATEDIFF('day', stage_entered_at,

19 CURRENT_DATE) AS days_in_stage,

20 amount_usd / NULLIF(days_in_stage, 0)

21 AS stage_velocity,

1822 amount_usd

1923 FROM {{ ref('stg_opportunities') }}

models/marts/_marts.yml +14 −2

2828 columns:

2929 - name: stage

30 - name: days_in_stage

31 tests:

32 - not_null

33 - dbt_utils.accepted_range:

34 min_value: 0

35 - name: stage_velocity

36 tests: [not_null]

Hex · GTM Health Monitor.hex notebook · live app

CELLS · 5

1 in industry_filter

2 sql query_health

3 tbl df_health

4 chart arr_by_industry

5 md narrative

all cells synced · 0.04s

[1] INPUT industry_filter ·

Industry SaaS ▾ Min ARR $50k ▾

[2] SQL df_health → snowflake.analytics ran 1.2s

SELECT
  industry,
  COUNT(DISTINCT account_id) AS account_count,
  SUM(arr_usd) AS total_arr,
  SUM(CASE WHEN health_tier = 'at_risk'
       THEN arr_usd ELSE 0 END) AS at_risk_arr
FROM {{ ref('fct_account_health') }}
WHERE industry = {{ industry_filter }}
  AND arr_usd >= {{ min_arr }}
GROUP BY 1
ORDER BY total_arr DESC

[3] TABLE df_health 4 rows · 1 col selected

industry	accounts	total_arr	at_risk_arr
SaaS	24	$4,212,000	$580,400
Fintech	17	$2,840,500	$310,200
Healthcare	11	$1,440,000	$180,000
Retail	9	$890,300	$96,500

[4] CHART arr_by_industry · bar ·

[03]stack

The pipeline I reach for.

From source extracts to a metric on someone's dashboard, this is the chain I work in most days. Tools listed are ones I've shipped work with — not collected.

→ 01

Ingest

FivetranREST APIsSFTPWebhooksPython

→ 02

Warehouse

SnowflakeBigQueryRedshift

→ 03

Model

dbtSQLDimensional modelingSCD Type 2dbt tests

→ 04

Serve

Looker / LookMLLightdashHexMode

Applied ML

Python·scikit-learn·lifetimes·LTV·lead scoring

AI-augmented analytics

OpenAI·Anthropic·RAG·pgvector·LLM-SQL

GTM & CRM

Salesforce·Gong·Outreach·Hightouch

[04]selected projects

Building in public.

Three projects shipping over the next several weeks. Each is a public dbt repo, a write-up, and an artifact you can actually poke at — designed to demonstrate analytics engineering at the messy middle: schema design, dbt model architecture, applied ML for business outcomes, and LLM-augmented analytics workflows.

project_01

GTM Semantic Layer

In progress

A complete dbt project modeling a synthetic CRM dataset — accounts, opportunities, contacts, activities — into a sales-funnel mart with account-health scoring. Staging → intermediate → marts architecture, 60+ models, custom generic tests for referential integrity and business rules, dbt docs deployed to GitHub Pages. The semantic layer renders out to Looker explores.

dbtBigQueryLookerdimensional modelingGitHub Actions

github →case study →dbt docs →

dbt lineage60 models

fct_pipeline_velocity12 cols

opp_idstringPK

account_idstringFK

stageenum

amount_usdnumeric

days_in_stageint

stage_velocitynumeric

is_stalledboolderived

close_datedate

created_attimestamp

updated_attimestamp

dbt_valid_fromtimestampSCD2

dbt_valid_totimestampSCD2

dbt test results87 pass · 0 fail

unique_opp_id · fct_pipeline_velocity

0.04s

not_null_account_id · all marts

0.02s

relationships_opp__account → dim_accounts

0.05s

accepted_values_stage · 7 enum values

0.02s

no_orphans__activities · custom generic

0.06s

freshness__salesforce · < 2h

0.01s

project_02

Marketing Attribution & LTV

Planning

A predictive-modeling layer on top of public e-commerce data (Brazilian Olist). dbt customer-journey models feed a Python BG/NBD + Gamma-Gamma LTV model and a multi-touch attribution analysis. End-to-end: warehouse modeling, feature engineering, lifetimes / scikit-learn modeling, surfaced through a small interactive app.

dbtPythonscikit-learnlifetimesattribution

github →case study →

predicted LTVBG/NBD

cohort retention12 mo

attribution mixmulti-touch

project_03

LLM-Augmented Analytics Workflow

Planning

An AI tool that knows your semantic layer. RAG over warehouse metadata, dbt exposures, and column-level lineage to answer natural-language questions with semantic-layer-grounded SQL — and flag when the question can't be answered cleanly by the existing model. The version of "talk to your data" that doesn't hallucinate metrics.

PythonOpenAIpgvectorRAGdbt metadata

github →case study →

conversationRAG

Q3 ARR for top-10 accounts by industry?

Pulling marts.fct_account_revenue joined to dim_accounts. Grouped by industry, ranked by Q3 ARR, top 10. ↳ exposure: ae_account_revenue · 14 tests passing

Only ones with a stalled opp.

Filtering opportunities.last_activity_at < now() − 30d and stage NOT IN (closed_won, closed_lost). ↳ semantic: stalled_opp_count (dbt metric)

retrieval · top-kcos sim

marts.fct_account_revenue mart0.94

marts.dim_accounts dim0.91

metrics.stalled_opp_count metric0.88

marts.fct_opportunities mart0.84

exposures.gtm_dashboard exposure0.79

int_accounts__activity int0.71

generated SQLcompiled

grounded from

marts.fct_account_revenue · dim_accounts

generated query

SELECT
  d.industry,
  SUM(r.arr_usd) AS q3_arr
FROM {{ ref('fct_account_revenue') }} r
JOIN {{ ref('dim_accounts') }} d
  USING (account_id)
WHERE r.fiscal_quarter = '2026Q3'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

[05]contact

Open to analytics engineering work.

Hiring AE — especially GTM, marketing, or product analytics — or want to argue about dbt patterns and semantic-layer design? The door is open. Quickest reply via email; the others all forward to my phone.

email
xanderwbennett@gmail.com → linkedin
/in/xander-bennett → github
@xander-bennett →

Modeling themessy middleof the data stack.

I build the layer where business questions meet warehouse reality.

The artifacts of the work.

The pipeline I reach for.

Applied ML

AI-augmented analytics

GTM & CRM

Building in public.

GTM Semantic Layer

Marketing Attribution & LTV

LLM-Augmented Analytics Workflow

Open to analytics engineering work.

Modeling the
messy middle
of the data stack.