Case Study

Last-Click Attribution Analysis

How I exposed a $412K misattribution error in default reporting — and built a multi-touch model to show what the data actually says about paid social spend.

SQL & MySQL
Recursive CTEs
Multi-Touch Attribution
Data Storytelling
Client Type
Growth-Stage DTC Brand
Main Goal
Expose systemic flaw in default attribution reporting.
Output
Multi-touch attribution model & executive misattribution report.
The Problem
Default last-click reporting credited $0 to every paid social channel.
Tech Stack
MySQL, Recursive CTEs, Chart.js, Linear & First-Click Attribution Modelling.
Business ROI
Identified $286K in paid social revenue invisible to the CMO, and $412K over-credited to closing channels.

The Workflow

Before
  • Last-click model reported TikTok, Facebook & YouTube drove $0 in Q2 revenue.
  • Budget decisions were made on platform-level reports, not verified backend journey data.
  • Raw path data had mixed separators (>, -> and |), making analysis impossible without normalization.
  • No visibility into which channels were opening vs. closing the funnel.
After
  • Linear attribution model correctly credited $286K to paid social channels across 5,000 journeys.
  • SQL pipeline with nested string normalization resolved the mixed-separator data quality issue.
  • Three attribution models (last-click, first-click, linear) run in parallel for a complete picture.
  • Clear channel roles identified: TikTok opens, Direct closes — both are necessary, both deserve budget.

The Business Problem: Attribution Crisis Inside the Dashboard

A growth-stage DTC brand was running active paid social campaigns across TikTok, Facebook, and YouTube. Their Q2 performance report — pulled from their default analytics setup — showed all three channels delivering $0 in attributed revenue.

The obvious next move: cut the budget. Why fund channels that aren’t producing?

The problem is that the report wasn’t lying about the data — it was lying about what the data meant. Last-click attribution, the default model in most analytics platforms, assigns 100% of conversion credit to the final touchpoint before a purchase.

If a customer clicked a TikTok ad, then a retargeting ad, then typed the URL directly — Direct Traffic gets the full credit. TikTok gets nothing.

In a world where customers touch 3–4 channels before converting, this isn’t a minor rounding error. It’s a misreading of the structure of how revenue is actually generated.

5,000
Customer journey logs analysed
Q2 Dataset
621
Completed conversions
12.4% conversion rate
$713K
Total verified Q2 revenue
Backend confirmed
$0
Paid social credit under last-click
The lie

My Approach: Engineering the Attribution Workflow

To correctly credit revenue across the full customer journey, I bypassed platform-level reports entirely. I built a custom SQL attribution engine directly within the data warehouse, creating a single source of truth that ad platform reporting incentives couldn’t manipulate.

Here is the step-by-step build process:

Phase 0 · Creating Tables

Before any analysis could begin, I needed a structured schema to hold the raw CSV data. This ensures strict data types and prevents formatting errors during the transformation process.

raw_messy_csv
View SQL Query: Create Raw Journeys Table
CREATE TABLE q2_customer_journeys (
    customer_id VARCHAR(50) NOT NULL,
    path VARCHAR(255) NOT NULL,
    revenue NUMERIC(10, 2) DEFAULT 0.00,
    purchase_status VARCHAR(50),
    PRIMARY KEY (customer_id, path)
);
Architecture Note: Why SQL over Python?

While this logic can easily be written in Pandas—or handled downstream using Power Query in Excel and Power BI—building the attribution model purely in SQL via Recursive CTEs pushes the heavy computation upstream to the data warehouse (e.g., Snowflake, BigQuery). This creates a scalable, automated pipeline that feeds clean, pre-calculated multi-touch data directly into your BI tools without bottlenecking their performance.

Phase 1 · Data Cleaning

Building the Foundation: Normalizing Messy Journey Data

Before any attribution model could run, the raw CSV required normalization. The dataset used three different path separators across records — a common real-world data quality issue that would silently corrupt touchpoint counts if not resolved first.

1. Path Separator Normalization

Problem: The path column mixed ” > “, ” -> ” and ” | ” as separators across rows, treating the same channel combinations as different paths.

View SQL Query:
CREATE OR REPLACE VIEW completed_journeys AS
SELECT
    customer_id,
    -- Normalizes all separator variations (' | ', '|', ' -> ', '->') into a clean ' > '
    REPLACE(REPLACE(REPLACE(REPLACE(path, ' | ', ' > '), '|', ' > '), ' -> ', ' > '), '->', ' > ') AS path_clean,
    revenue,
    purchase_status
FROM q2_customer_journeys
WHERE purchase_status = 'Completed'
  AND revenue > 0;
Normalizing Path Separators

Key Insight

The raw data had three conflicting separator formats — mixing >, -> and | in the same column. Without this normalization step, touchpoints would have been miscounted and every model downstream would have produced incorrect results. Real-world data is always messier than the brief suggests.

2. Unpacking Paths via Recursive CTE

Most CRMs log touchpoints as a single, messy text string. To lay the foundation for the multi-touch attribution model, I use a recursive query to unpack these strings into individual, trackable rows systematically.

View SQL Query:
CREATE OR REPLACE VIEW touchpoint_credits AS
WITH RECURSIVE path_splitter AS (
    SELECT
        customer_id, revenue, CAST(path_clean AS CHAR(1000)) AS path_clean,
        CAST(TRIM(SUBSTRING_INDEX(path_clean, ' > ', 1)) AS CHAR(255)) AS touchpoint,
        CAST(IF(LOCATE(' > ', path_clean) > 0, TRIM(SUBSTRING(path_clean, LOCATE(' > ', path_clean) + 3)), NULL) AS CHAR(1000)) AS remainder, 
        1 AS touch_position,
        CAST((LENGTH(path_clean) - LENGTH(REPLACE(path_clean, ' > ', ''))) / 3 + 1 AS SIGNED) AS total_touches
    FROM completed_journeys
    UNION ALL
    SELECT
        customer_id, revenue, path_clean,
        CAST(TRIM(SUBSTRING_INDEX(remainder, ' > ', 1)) AS CHAR(255)) AS touchpoint,
        CAST(IF(LOCATE(' > ', remainder) > 0, TRIM(SUBSTRING(remainder, LOCATE(' > ', remainder) + 3)), NULL) AS CHAR(1000)) AS remainder,
        touch_position + 1, total_touches
    FROM path_splitter
    WHERE remainder IS NOT NULL
),
cleaned AS (
    SELECT
        customer_id, revenue, total_touches, touch_position,
        CASE
            WHEN touchpoint IN ('Direct', 'Direct_Traffic') THEN 'Direct'
            WHEN touchpoint = 'Google_Brand_Search' THEN 'Brand Search'
            ELSE REPLACE(touchpoint, '_', ' ')
        END AS channel,
        CASE WHEN touch_position = 1 THEN 1 ELSE 0 END AS is_first,
        CASE WHEN touch_position = total_touches THEN 1 ELSE 0 END AS is_last,
        revenue / total_touches AS linear_credit
    FROM path_splitter
    WHERE touchpoint IS NOT NULL AND CHAR_LENGTH(TRIM(touchpoint)) > 0
)
SELECT * FROM cleaned;
Explode Paths via Recursive CTE
Phase 2 · Path Length Analysis

Proving That Multi-Touch Journeys Drive Conversions

Before building attribution models, I needed to establish a behavioral baseline: do customers who convert actually touch multiple channels, or is the single-touch assumption defensible?

3. Converter vs. Bouncer Path Length Comparison

Question: Do converting customers actually interact with more channels than those who bounce?

View SQL Query:
SELECT
    purchase_status,
    ROUND(AVG(
        LENGTH(REPLACE(REPLACE(path, '|', '>'), '->', '>')) 
        - LENGTH(REPLACE(REPLACE(REPLACE(path, '|', '>'), '->', '>'), '>', '')) 
        + 1
    ), 2) AS avg_path_length
FROM q2_customer_journeys
GROUP BY purchase_status;
Avg Path Length (Converters vs Bouncers)
Fig. 1 — Path length: converters vs bouncers
Key Insight

Customers who converted averaged 3.37 touchpoints before purchasing. Those who bounced averaged 2.49. The data directly disproves the single-touch assumption that last-click attribution is built on. Longer journeys correlate with higher purchase intent — not channel confusion.

Phase 3 · Attribution Modelling

Running Three Attribution Models Against the Same Revenue

With the data cleaned and the multi-touch behavior confirmed, I applied three attribution models to the same 621 converting journeys. Each model answers the same question differently.

Last Click
  • 100% credit to the final touchpoint
  • Default in Google Analytics & most CRMs
  • Structurally rewards closing channels only
First Click
  • 100% credit to the initial touchpoint
  • Exposes top-of-funnel (TOFU) discovery engines
  • Structurally rewards awareness & acquisition
Linear (Multi-Touch)
  • Revenue split equally across every touchpoint
  • Conservative, defensible multi-touch baseline
  • Credits the full path, not just the last step

4. Building the Last-Click Model

Question: What does revenue attribution look like when only the final touchpoint is credited?

View SQL Query:
SELECT
    channel,
    ROUND(SUM(revenue), 0) AS last_click_revenue,
    COUNT(DISTINCT customer_id) AS conversions
FROM touchpoint_credits
WHERE is_last = 1
GROUP BY channel
ORDER BY last_click_revenue DESC;
Last-Click Model Output
Key Insight

Under last-click attribution, Direct Traffic and Brand Search received 100% of Q2 revenue credit — a combined $713,554. Every paid social channel returned $0. This is the exact report a CMO would receive from a default GA4 setup.

5. Building the First-Click Model

Next, I flipped the attribution logic entirely. To find out where these converting journeys actually began, I built a First-Click model to see which channels were driving initial discovery.

View SQL Query:
SELECT
    channel,
    ROUND(SUM(revenue), 0) AS first_click_revenue,
    COUNT(DISTINCT customer_id) AS conversions
FROM touchpoint_credits
WHERE is_first = 1
GROUP BY channel
ORDER BY first_click_revenue DESC;
First-Click Model Output
Key Insight

By shifting 100% of the credit to the first touchpoint, the true origin of the pipeline is exposed. TikTok, Facebook, and YouTube initiated over 80% ($574,357) of all Q2 converted revenue. This proves Paid Social is doing exactly what it was designed to do: feed the top of the funnel (TOFU). The default last-click model creates a dangerous blind spot by completely ignoring these critical discovery engines.

6. Building the Linear (Multi-Touch) Model

Question: What does revenue attribution look like when every channel in a converting path receives equal credit?

View SQL Query:
SELECT
    channel,
    ROUND(SUM(linear_credit), 0) AS linear_revenue
FROM touchpoint_credits
GROUP BY channel
ORDER BY linear_revenue DESC;
Linear (Multi-Touch) Model
Fig. 2 — Revenue per channel: last click vs linear vs first click
Key Insight

When credit is distributed across the full journey, TikTok, Facebook, and YouTube collectively account for $285,995 — representing 40.1% of total Q2 revenue. Under standard last-click reporting, that number was exactly $0. Meanwhile, Direct Traffic drops from $491,638 down to a still-substantial $206,614, properly reflecting its genuine role as a final closing channel rather than a top-of-funnel discovery engine.

Phase 4 · Misattribution Analysis

Quantifying the Budget Decision Error

7. Channel-Level Misattribution Delta

Question: For each channel, what is the gap between what last-click reported and what multi-touch modeling shows?

This is the core business finding. By combining the last-click and linear outputs, I isolated the exact financial gap and identified which channels were structurally undervalued by default reporting.

View SQL Query:
WITH last_click AS (
    SELECT channel, ROUND(SUM(revenue), 0) AS lc_revenue
    FROM touchpoint_credits WHERE is_last = 1 GROUP BY channel
),
linear_model AS (
    SELECT channel, ROUND(SUM(linear_credit), 0) AS ln_revenue
    FROM touchpoint_credits GROUP BY channel
),
all_channels AS (
    SELECT channel FROM last_click
    UNION
    SELECT channel FROM linear_model
)
SELECT
    a.channel, COALESCE(lc.lc_revenue, 0) AS last_click_revenue,
    COALESCE(ln.ln_revenue, 0) AS linear_revenue,
    COALESCE(ln.ln_revenue, 0) - COALESCE(lc.lc_revenue, 0) AS delta,
    CASE
        WHEN COALESCE(ln.ln_revenue, 0) - COALESCE(lc.lc_revenue, 0) > 0  THEN 'Undervalued'
        WHEN COALESCE(ln.ln_revenue, 0) - COALESCE(lc.lc_revenue, 0) < 0  THEN 'Over-credited'
        ELSE 'Neutral'
    END AS attribution_status
FROM all_channels a
LEFT JOIN last_click lc ON a.channel = lc.channel
LEFT JOIN linear_model ln ON a.channel = ln.channel
ORDER BY delta DESC;
Channel Last Click Linear Delta Status
TikTok Ad$0$99,997+$99,997Undervalued
Facebook Ad$0$93,164+$93,164Undervalued
YouTube Ad$0$92,834+$92,834Undervalued
Email Promo$0$46,243+$46,243Undervalued
Retargeting$0$40,859+$40,859Undervalued
Blog Post$0$38,464+$38,464Undervalued
Direct$491,638$206,614−$285,024Over-credited
Brand Search$221,916$95,378−$126,538Over-credited
Key Insight

Six channels received $0 credit from last-click attribution despite participating in 37.6% of Q2 converting journeys. Direct and Brand Search are genuine closing channels — but they are receiving credit for work done by paid social upstream. The misattribution delta is not noise; it is structurally baked into how the default model works.

8. Top Converting Path Sequences

Question: What are the actual multi-step paths that generate the most revenue — and do they confirm the channel role hypothesis?

Analyzing the most frequent multi-step sequences proved that paid social channels were initiating the journeys, but direct traffic was getting all the credit for closing them.

View SQL Query:
SELECT
    REPLACE(REPLACE(REPLACE(path_clean, 'Direct_Traffic', 'Direct'), 'Google_Brand_Search', 
    'Brand Search'), '_', ' ') AS customer_path,
    COUNT(*) AS conversions,
    ROUND(SUM(revenue), 0) AS total_revenue
FROM completed_journeys
GROUP BY customer_path
HAVING COUNT(*) >= 2
ORDER BY total_revenue DESC
LIMIT 10;

Full Customer Path Conversions Revenue
Direct78$91,921
Brand Search42$47,276
YouTube Ad → Direct19$22,689
Facebook Ad → Direct17$21,358
TikTok Ad → Direct16$16,839
TikTok Ad → Facebook Ad → Direct7$8,510
YouTube Ad → Brand Search7$8,125
YouTube Ad → Facebook Ad → Direct7$7,740
TikTok Ad → Brand Search6$7,598
Facebook Ad → Retargeting → Direct6$6,330
Key Insight

Every single one of the top 10 converting paths ends in Direct or Brand Search — exactly what the last-click model reports. But 8 of those same 10 paths were initiated by a paid social channel. Last-click reporting sees only the final step of a multi-step relay. Cutting TikTok based on last-click data doesn’t eliminate a non-performing channel—it removes the runner who hands off the baton to your closer.

Strategic Takeaway

Last-click attribution is not wrong about which channel closes the sale. It is wrong about which channels make the sale possible. Paid social (TikTok, Facebook, YouTube) initiated the majority of high-value converting journeys in Q2. Direct and Brand Search closed them. Both roles are essential — but the default model only funds the closer. If paid social budget is cut based on this report, the pipeline feeding Direct and Brand Search shrinks, and revenue falls 30–60 days later with no obvious cause.

Business Impact

By replacing platform-level reports with a robust SQL-based attribution layer built directly from journey log data, the business gained the ability to make budget decisions based on verified, model-agnostic evidence.

Attribution Clarity
Revealed $286K in paid social revenue that the default model reported as $0, enabling accurate channel valuation.
Waste Prevention
Prevented the misallocation of paid social budget based on a structurally incorrect attribution model.
Channel Role Map
Established that TikTok opens, Direct closes — giving each channel a defensible budget justification.
Scalable Pipeline
Automated SQL pipeline runs on any new journey export — no manual rebuild required each reporting cycle.

Executive Overview: Q2 Attribution Summary

$0
Paid social under last-click
Action Required
$286K
Paid social under linear model
True Contribution
$412K
Over-credited to Direct & Brand Search
Misattributed
3.37×
Avg touchpoints for converters
vs 2.49× bouncers
Q2 Revenue by path length bucket
View SQL Query: Revenue Grouped by Path Length
SELECT
    CASE
        WHEN total_touches = 1 THEN '1 touch'
        WHEN total_touches = 2 THEN '2 touches'
        WHEN total_touches = 3 THEN '3 touches'
        ELSE '4+ touches'
    END AS path_length_bucket,
    COUNT(DISTINCT customer_id) AS conversions,
    ROUND(SUM(revenue), 0) AS total_revenue,
    ROUND(AVG(revenue), 0) AS avg_order_value
FROM (
    SELECT DISTINCT customer_id, revenue, total_touches
    FROM touchpoint_credits
) deduped
GROUP BY path_length_bucket
ORDER BY MIN(total_touches);

Strategic Recommendations

Immediate Action: Protect the Discovery Engines

Do not reduce paid social budgets based on standard last-click data. TikTok, Facebook, and YouTube are top-of-funnel initiators, not underperformers. Cutting their spend removes the customers who eventually arrive at Direct and Brand Search—the channels that actually close. Because B2B sales cycles are long, these budget cuts will show up as an unexplained revenue decline 30–60 days later, making the attribution error very difficult to trace.

Short-Term (This Quarter): Shift the Reporting Baseline

Switch from last-click to a linear or position-based model in your primary reporting layer. Most standard analytics platforms support this without requiring additional data engineering. Run both models in parallel for 60 days to build historical confidence before presenting the new baseline to leadership.

Medium-Term (Next 6 Months): Transition to Algorithmic Attribution

Commission a Markov chain attribution model. Unlike heuristic, rule-based models (last-click, linear, first-click), Markov chains calculate the actual incremental contribution of each channel by modeling how the overall conversion probability changes when a specific channel is removed from the path. This is the current industry standard for rigorous, defensible attribution.

On Model Limitations

No attribution model is perfect. The goal of this analysis is not to declare a single “true” model—it is to demonstrate that last-click is not a neutral default. It carries a severe bottom-of-funnel bias. Even a conservative linear model produces channel valuations that are dramatically more accurate and less dangerous to growth than assigning 100% of the credit to the channel a customer happened to touch last.

Data Privacy & Governance Disclosure: To strictly adhere to client confidentiality, the dataset used in this case study is a structured Q2 journey log. Specific customer identifiers and financial values have been anonymized. The SQL attribution logic and strategic analysis represent authentic methodologies deployed in production environments.

Need help with attribution, tracking, or reporting work?

If you need support with multi-touch attribution modelling, GA4/GTM setup, SQL-based data analysis, or executive dashboard builds, use the contact page or return to the portfolio to view more project examples.