Case Study

Power Query Ecommerce Cleanup Workflow

This project transformed a highly fragmented, messy dataset from multiple ad platforms into a standardized, fully automated data pipeline for the De-Estybee Ecom Store.

Power BI
Power Query
M Code
Data Automation
Quick Snapshot
Client
De-Estybee Ecom Store
Main Goal
Build a standardized, automated reporting workflow.
Output
Refreshable pipeline & executive reporting tables.
The Problem
Fragmented dataset with overlapping channels, formatting errors, and conflicting date structures.
Tech Stack
Power BI, Power Query, and Custom M Code programming.
The Output
A fully automated, refreshable data pipeline and dynamic executive reporting tables.
Business ROI
Faster reporting, zero duplicate counting, and clear visibility into profitable ad scaling.
Transformation

The Workflow Evolution

Before

  • Traffic sources were split across 12 overlapping, messy variations.
  • Date columns contained a chaotic mix of US, UK/Global, ISO, and text-month formats causing constant errors.
  • Financial columns were unusable due to a mix of numeric values, text strings, and currency symbols.
  • Duplicate transactions created inflated, unreliable revenue totals.

After

  • Power Query automatically consolidated the 12 sources into four clean categories.
  • 100% of date rows dynamically evaluated and converted into a unified standard.
  • Spend and revenue data programmatically stripped of text and rounded for exact financial accuracy.
  • The reporting process became a 1-click refreshable pipeline instead of a manual rebuild.

The Challenge

De-Estybee Ecom Store handed over a highly fragmented and messy dataset sourced from multiple advertising platforms.

The objective wasn’t just to clean the data for a one-off report, but to build a standardized, automated workflow.

The stakeholders had 5 strict requirements:

  1. Standardized Channels: Consolidate 12 overlapping traffic sources into four clean categories (Facebook, Google, Instagram, TikTok).
  2. Clean Numbers: Strip out all text, symbols ($, USD), and formatting errors from financial columns to allow for accurate aggregation.
  3. Real Dates: Convert conflicting global date formats into a single, workable standard for month-over-month grouping.
  4. No Duplicates: Ensure absolute accuracy by removing duplicate transactions via unique TXN IDs.
  5. Calculated ROI: Engineer a custom metric to track Return on Ad Spend (ROAS) using the formula (Sales_Revenue - Ad_Spend) / Ad_Spend.

How I Cleaned and Automated a Messy Ecommerce Reporting Workflow

To ensure the client never had to clean this data manually again, I bypassed Excel and built a fully automated, refreshable data pipeline using Power Query.

Here is how I tackled the core issues:

1. Taming the Date Collision

The source data featured a chaotic mix of US (01/15/2023), UK/Global (17-01-2023), standard ISO (2023-01-16), and text-month (Jan 19 2023) date formats living in the exact same column.

messy date tab

Standard conversion tools instantly returned errors.

To fix this, I wrote custom M code using the Date.From function to dynamically evaluate and parse each row, successfully converting 100% of the rows into a clean, unified Date format.

cleaned date column

2. Channel Consolidation (1-Click Grouping)

Before grouping, I trimmed all invisible spaces and converted the text to Proper Case to prevent matching errors.

messy marketing channel tab

A quick data profile revealed 12 messy variations of marketing channels.

enabling column profile for marketing channel

Instead of manually mapping each variation, I built a highly efficient conditional logic column.

For example, IF the channel started with “F”, it was routed to “Facebook.” But for specific outliers like AdWords, I used an EXACT match logic.

conditional formating for marketing channel column

This instantly consolidated the messy data into the four requested categories.

cleaned marketing channel tab

3. Financial Data Extraction & Formatting

The “Ad Spend” and “Sales Revenue” columns were unusable due to a mix of numeric values, text strings, and currency symbols.

messy ad spend column

I resolved this data-type mismatch by deploying a simple but highly effective Text.Select logic.

This script programmatically isolated only the numeric characters.

fixed ad spend and sales revenue column

To guarantee financial reporting accuracy, I implemented a strict rounding protocol, formatting all spend and revenue data to exactly two decimal places.

4. Deduplication

To ensure we weren’t double-counting revenue, I targeted the TXN ID column, removing all duplicates so that each row represented a single, unique transaction.

analyzing the column distribution
fixing duplicate values

5. Over-Delivering on ROI Logic

The stakeholder requested a standard ROAS calculation, but I took it a step further to make the dashboard actionable.

I created a custom conditional column that calculated ROI and included specific error handling to replace all null values with 0.

adding calculated roas column

If Sales Revenue was 0 but Ad Spend was active, the system automatically returned -100%.

power bi dashboard

When visualized in Power BI, this serves as an immediate red flag for stakeholders, highlighting underperforming campaigns that need to be paused immediately.

(Note: This logic bypassed Search campaigns, where zero spend with positive revenue is an expected organic outcome).

The Results & Data Insights

With the data successfully cleaned and modeled, I generated the final executive reporting tables and dashboard.

Dashboard

dashboard

Monthly Spend vs. Revenue

MonthAd Spend ($)Sales Revenue ($)
January 202349,018.75109,626.53
February 202355,606.06124,934.05
March 202345,675.34102,699.73
April 202369,242.54148,039.97
May 202366,987.69139,657.07
June 202370,272.12170,214.50
July 202354,980.25124,906.28
August 202365,427.79131,487.22
September 202351,958.64120,612.68
October 202369,948.46156,803.45
November 202353,718.00121,543.10
December 202363,194.52134,540.91
January 20245,773.3412,928.59

Performance Summary by Channel

Facebook emerged as the top-performing channel by sheer volume, while TikTok and Facebook tied for the highest average ROI per transaction.

ChannelTotal Spend ($)Total Revenue ($)Average ROI
Facebook207,058.10474,403.72109.73%
TikTok152,745.90358,605.15106.91%
Instagram158,987.99344,867.49105.78%
Google203,011.51420,117.7298.10%

Key Business Takeaways

āœ… Top Performing Campaign: The Search_Competitor campaign on TikTok drove an impressive aggregate ROI of 166.46% (generating $62,659.49 in revenue from just $23,515.79 in spend).

āœ… Peak Revenue: June 2023 marked the highest monthly revenue at $170,214.50, coinciding perfectly with the highest monthly ROI (142.22%).

āœ… Spend Efficiency & Stability: Throughout 2023, the monthly ROI consistently held above 100%. More importantly, when the client scaled their budget in June and October, ROI remained highly profitable, proving that the business can confidently scale ad spend without hitting immediate diminishing returns.

What a Business Owner Gains

A business owner implementing this pipeline gains:

  • Less time wasted on manual spreadsheet cleanup.
  • More reliable reporting, with totals free of double-counted rows.
  • Clearer visibility into channel-level performance and ROAS.
  • A repeatable system for future monthly reporting.
  • A stronger, error-free base for dashboard and budget analysis.

Note on Data Privacy: To protect proprietary business intelligence, all transaction IDs and specific date stamps in this case study have been anonymized or generated as synthetic data. The underlying mathematical logic and Power Query cleaning workflows remain identical to the original production environment.

Next Step

Need help with tracking, reporting, or dashboard work?

If you need support with GA4/GTM setup, Power Query cleanup, Excel dashboards, or SQL-based analysis, use the contact page or return to the portfolio to view more project examples.