Beyond the Medallion: Cost-Saving Alternatives for Microsoft Fabric Data Estates

The Medallion Architecture (Bronze → Silver → Gold) has become the industry’s default standard for building scalable data estates—especially in Microsoft Fabric. It’s elegant, modular, easy to explain to business users, and aligns well with modern ELT workflows.

The Medallion Architecture remains one of the most effective and scalable patterns for modern data engineering because it introduces structured refinement, clarity, and governance into a data estate. By organising data into Bronze, Silver, and Gold layers, it provides a clean separation of concerns: raw ingestion is preserved for auditability, cleaned and conformed data is standardised for consistency, and curated business-ready data is optimised for analytics. This layered approach reduces complexity, improves data quality, and makes pipelines easier to maintain and troubleshoot. It also supports incremental processing, promotes reusability of transformation logic, and enables teams to onboard new data sources without disrupting downstream consumers. For growing organisations, the Medallion Architecture offers a well-governed, scalable foundation that aligns with both modern ELT practices and enterprise data management principles

But as many companies have discovered, a full 3-layer medallion setup can come with unexpected operational costs:

  • Too many transformation layers
  • Heavy Delta Lake I/O
  • High daily compute usage
  • BI refreshes duplicating transformations
  • Redundant data copies
  • Long nightly pipeline runtimes

The result?
Projects start simple but the estate grows heavy, slow, and expensive.

The good news: A medallion architecture is not the only option. There are several real-world alternatives (and hybrids) that can reduce hosting costs by 40-80% and cut daily processing times dramatically.

This blog explores those alternatives—with in-depth explanation and real examples from real implementations.


Why Medallion Architectures Become Expensive

The medallion pattern emerged from Databricks. But in Fabric, some teams adopt it uncritically—even when the source data doesn’t need three layers.

Consider a common case:

A retail company stores 15 ERP tables. Every night they copy all 15 tables into Bronze, clean them into Silver, and join them into 25 Gold tables.

Even though only 3 tables change daily, the pipelines for all 15 run every day because “that’s what the architecture says.”

This is where costs balloon:

  • Storage multiplied by 3 layers
  • Pipelines running unnecessarily
  • Long-running joins across multiple layers
  • Business rules repeating in Gold tables

If this sounds familiar… you’re not alone.


1. The “Mini-Medallion”: When 2 Layers Are Enough

Not all data requires Bronze → Silver → Gold.

Sometimes two layers give you 90% of the value at 50% of the cost.

The 2-Layer Variant

  1. Raw (Bronze):
    Store the original data as-is.
  2. Optimised (Silver/Gold combined):
    Clean + apply business rules + structure the data for consumption.

Real Example

A financial services client was running:

  • 120 Bronze tables
  • 140 Silver tables
  • 95 Gold tables

Their ERP was clean. The Silver layer added almost no value—just a few renames and type conversions. We replaced Silver and Gold with one Optimised layer.

Impact:

  • Tables reduced from 355 to 220
  • Daily pipeline runtime cut from 9.5 hours to 3.2 hours
  • Fabric compute costs reduced by ~48%

This is why a 2-layer structure is often enough for modern systems like SAP, Dynamics 365, NetSuite, and Salesforce.


2. Direct Lake: The Biggest Cost Saver in Fabric

Direct Lake is one of Fabric’s superpowers.

It allows Power BI to read delta tables directly from the lake, without Import mode and without a Gold star-schema layer.

You bypass:

  • Power BI refresh compute
  • Gold table transformations
  • Storage duplication

Real Example

A manufacturer had 220 Gold tables feeding Power BI dashboards. After migrating 18 of their largest models to Direct Lake:

Results:

  • Removed the entire Gold layer for those models
  • Saved ±70% on compute
  • Dropped Power BI refreshes from 30 minutes to seconds
  • End-users saw faster dashboards without imports

If your business intelligence relies heavily on Fabric + Power BI, Direct Lake is one of the biggest levers available.


3. ELT-on-Demand: Only Process What Changed

Most pipelines run on a schedule because that’s what engineers are used to. But a large portion of enterprise data does not need daily refresh.

Better alternatives:

  • Change Data Feed (CDF)
  • Incremental watermarking
  • Event-driven processing
  • Partition-level processing

Real Example

A logistics company moved from full daily reloads to watermark-based incremental processing.

Before:

  • 85 tables refreshed daily
  • 900GB/day scanned

After:

  • Only 14 tables refreshed
  • 70GB/day scanned
  • Pipelines dropped from 4 hours to 18 minutes
  • Compute cost fell by ~82%

Incremental processing almost always pays for itself in the first week.


4. OneBigTable: When a Wide Serving Table Is Cheaper

Sometimes the business only needs one big denormalised table for reporting. Instead of multiple Gold dimension + fact tables, you build a single optimised serving table.

This can feel “anti-architecture,” but it works.

Real Example

A telco was loading:

  • 12 fact tables
  • 27 dimensions
  • Dozens of joins running nightly

Reporting only used a handful of those dimensions.

We built a single OneBigTable designed for Power BI.

Outcome:

  • Gold tables reduced by 80%
  • Daily compute reduced by 60%
  • Power BI performance improved due to fewer joins
  • Pipeline failures dropped significantly

Sometimes simple is cheaper and faster.


5. Domain-Based Lakehouses (Micro-Lakehouses)

Rather than one giant medallion, split your estate based on business domains:

  • Sales Lakehouse
  • Product Lakehouse
  • HR Lakehouse
  • Logistics Lakehouse

Each domain has:

  • Its own small Bronze/Silver/Gold
  • Pipelines that run only when that domain changes

Real Example

A retail group broke their 400-table estate into 7 domains. The nightly batch that previously ran for 6+ hours now runs:

  • Sales domain: 45 minutes
  • HR domain: 6 minutes
  • Finance domain: 1 hour
  • Others run only when data changes

Fabric compute dropped by 37% with no loss of functionality.


6. Data Vault 2.0: The Low-Cost Architecture for High-Volume History

If you have:

  • Millions of daily transactions
  • High historisation requirements
  • Many sources merging in a single domain

Data Vault often outperforms Medallion.

Why?

  • Hubs/Links/Satellites only update what changed
  • Perfect for incremental loads
  • Excellent auditability
  • Great for multi-source integration

Real Example

A health insurance provider stored billions of claims. Their medallion architecture was running 12–16 hours of pipelines daily.

Switching to Data Vault:

  • Stored only changed records
  • Reduced pipeline time to 45 minutes
  • Achieved 90% cost reduction

If you have high-cardinality or fast-growing data, Data Vault is often the better long-term choice.


7. KQL Databases: When Fabric SQL Is Expensive or Overkill

For logs, telemetry, IoT, or operational metrics, Fabric KQL DBs (Kusto) are:

  • Faster
  • Cheaper
  • Purpose-built for time-series
  • Zero-worry for scaling

Real Example

A mining client stored sensor data in Bronze/Silver. Delta Lake struggled with millions of small files from IoT devices.

Switching to KQL:

  • Pipeline cost dropped ~65%
  • Query time dropped from 20 seconds to < 1 second
  • Storage compressed more efficiently

Use the right store for the right job.


Putting It All Together: A Modern, Cost-Optimised Fabric Architecture

Here’s a highly efficient pattern we now recommend to most clients:

The Hybrid Optimised Model

  1. Bronze: Raw Delta, incremental only
  2. Silver: Only where cleaning is required
  3. Gold: Only for true business logic (not everything)
  4. Direct Lake → Power BI (kills most Gold tables)
  5. Domain Lakehouses
  6. KQL for logs
  7. Data Vault for complex historisation

This is a far more pragmatic and cost-sensitive approach that meets the needs of modern analytics teams without following architecture dogma.


Final Thoughts

A Medallion Architecture is a great starting point—but not always the best endpoint.

As data volumes grow and budgets tighten, organisations need architectures that scale economically. The real-world examples above show how companies are modernising their estates with:

  • Fewer layers
  • Incremental processing
  • Domain-based designs
  • Direct Lake adoption
  • The right storage engines for the right data

If you’re building or maintaining a Microsoft Fabric environment, it’s worth stepping back and challenging old assumptions.

Sometimes the best architecture is the one that costs less, runs faster, and your team can actually maintain.


The Epiphany Moment of Euphoria in a Data Estate Development Project

In our technology-driven world, engineers pave the path forward, and there are moments of clarity and triumph that stand comparable to humanity’s greatest achievements. Learning at a young age from these achievements shape our way of thinking and can be a source of inspiration that enhances the way we solve problems in our daily lives. For me, one of these profound inspirations stems from an engineering marvel: the Paul Sauer Bridge over the Storms River in Tsitsikamma, South Africa – which I first visited in 1981. This arch bridge, completed in 1956, represents more than just a physical structure. It embodies a visionary approach to problem-solving, where ingenuity, precision, and execution converge seamlessly.

The Paul Sauer Bridge across the Storms River Gorge in South Africa.

The bridge’s construction involved a bold method: engineers built two halves of the arch on opposite sides of the gorge. Each section was erected vertically and then carefully pivoted downward to meet perfectly in the middle, completing the 100m span, 120m above the river. This remarkable feat of engineering required foresight, meticulous planning, and flawless execution – a true epiphany moment of euphoria when the pieces fit perfectly.

Now, imagine applying this same philosophy to building data estate solutions. Like the bridge, these solutions must connect disparate sources, align complex processes, and culminate in a seamless result where data meets business insights.

This blog explores how to achieve this epiphany moment in data projects by drawing inspiration from this engineering triumph.

The Parallel Approach: Top-Down and Bottom-Up

Building a successful data estate solution, I believe requires a dual approach, much like the simultaneous construction of both sides of the Storms River Bridge:

  1. Top-Down Approach:
    • Start by understanding the end goal: the reports, dashboards, and insights that your organization needs.
    • Focus on business requirements such as wireframe designs, data visualization strategies, and the decisions these insights will drive.
    • Use these goals to inform the types of data needed and the transformations required to derive meaningful insights.
  2. Bottom-Up Approach:
    • Begin at the source: identifying and ingesting the right raw data from various systems.
    • Ensure data quality through cleaning, validation, and enrichment.
    • Transform raw data into structured and aggregated datasets that are ready to be consumed by reports and dashboards.

These two streams work in parallel. The Top-Down approach ensures clarity of purpose, while the Bottom-Up approach ensures robust engineering. The magic happens when these two streams meet in the middle – where the transformed data aligns perfectly with reporting requirements, delivering actionable insights. This convergence is the epiphany moment of euphoria for every data team, validating the effort invested in discovery, planning, and execution.

When the Epiphany Moment Isn’t Euphoric

While the convergence of Top-Down and Bottom-Up approaches can lead to an epiphany moment of euphoria, there are times when this anticipated triumph falls flat. One of the most common reasons is discovering that the business requirements cannot be met as the source data is insufficient, incomplete, or altogether unavailable to meet the reporting requirements. These moments can feel like a jarring reality check, but they also offer valuable lessons for navigating data challenges.

Why This Happens

  1. Incomplete Understanding of Data Requirements:
    • The Top-Down approach may not have fully accounted for the granular details of the data needed to fulfill reporting needs.
    • Assumptions about the availability or structure of the data might not align with reality.
  2. Data Silos and Accessibility Issues:
    • Critical data might reside in silos across different systems, inaccessible due to technical or organizational barriers.
    • Ownership disputes or lack of governance policies can delay access.
  3. Poor Data Quality:
    • Data from source systems may be incomplete, outdated, or inconsistent, requiring significant remediation before use.
    • Legacy systems might not produce data in a usable format.
  4. Shifting Requirements:
    • Business users may change their reporting needs mid-project, rendering the original data pipeline insufficient.

The Emotional and Practical Fallout

Discovering such issues mid-development can be disheartening:

  • Teams may feel a sense of frustration, as their hard work in data ingestion, transformation, and modeling seems wasted.
  • Deadlines may slip, and stakeholders may grow impatient, putting additional pressure on the team.
  • The alignment between business and technical teams might fracture as miscommunications come to light.

Turning Challenges into Opportunities

These moments, though disappointing, are an opportunity to re-evaluate and recalibrate your approach. Here are some strategies to address this scenario:

1. Acknowledge the Problem Early

  • Accept that this is part of the iterative process of data projects.
  • Communicate transparently with stakeholders, explaining the issue and proposing solutions.

2. Conduct a Gap Analysis

  • Assess the specific gaps between reporting requirements and available data.
  • Determine whether the gaps can be addressed through technical means (e.g., additional ETL work) or require changes to reporting expectations.

3. Explore Alternative Data Sources

  • Investigate whether other systems or third-party data sources can supplement the missing data.
  • Consider enriching the dataset with external or public data.

4. Refine the Requirements

  • Work with stakeholders to revisit the original reporting requirements.
  • Adjust expectations to align with available data while still delivering value.

5. Enhance Data Governance

  • Develop clear ownership, governance, and documentation practices for source data.
  • Regularly audit data quality and accessibility to prevent future bottlenecks.

6. Build for Scalability

  • Future-proof your data estate by designing modular pipelines that can easily integrate new sources.
  • Implement dynamic models that can adapt to changing business needs.

7. Learn and Document the Experience

  • Treat this as a learning opportunity. Document what went wrong and how it was resolved.
  • Use these insights to improve future project planning and execution.

The New Epiphany: A Pivot to Success

While these moments may not bring the euphoria of perfect alignment, they represent an alternative kind of epiphany: the realisation that challenges are a natural part of innovation. Overcoming these obstacles often leads to a more robust and adaptable solution, and the lessons learned can significantly enhance your team’s capabilities.

In the end, the goal isn’t perfection – it’s progress. By navigating the difficulties of misalignment, incomplete or unavailable data with resilience and creativity, you’ll lay the groundwork for future successes and, ultimately, more euphoric epiphanies to come.

Steps to Ensure Success in Data Projects

To reach this transformative moment, teams must adopt structured practices and adhere to principles that drive success. Here are the key steps:

1. Define Clear Objectives

  • Identify the core business problems you aim to solve with your data estate.
  • Engage stakeholders to define reporting and dashboard requirements.
  • Develop a roadmap that aligns with organisational goals.

2. Build a Strong Foundation

  • Invest in the right infrastructure for data ingestion, storage, and processing (e.g., cloud platforms, data lakes, or warehouses).
  • Ensure scalability and flexibility to accommodate future data needs.

3. Prioritize Data Governance

  • Implement data policies to maintain security, quality, and compliance.
  • Define roles and responsibilities for data stewardship.
  • Create a single source of truth to avoid duplication and errors.

4. Embrace Parallel Development

  • Top-Down: Start designing wireframes for reports and dashboards while defining the key metrics and KPIs.
  • Bottom-Up: Simultaneously ingest and clean data, applying transformations to prepare it for analysis.
  • Use agile methodologies to iterate and refine both streams in sync.

5. Leverage Automation

  • Automate data pipelines for faster and error-free ingestion and transformation.
  • Use tools like ETL frameworks, metadata management platforms, and workflow orchestrators.

6. Foster Collaboration

  • Establish a culture of collaboration between business users, analysts, and engineers.
  • Encourage open communication to resolve misalignments early in the development cycle.

7. Test Early and Often

  • Validate data accuracy, completeness, and consistency before consumption.
  • Conduct user acceptance testing (UAT) to ensure the final reports meet business expectations.

8. Monitor and Optimize

  • After deployment, monitor the performance of your data estate.
  • Optimize processes for faster querying, better visualization, and improved user experience.

Most Importantly – do not forget that the true driving force behind technological progress lies not just in innovation but in the people who bring it to life. Investing in the right individuals and cultivating a strong, capable team is paramount. A team of skilled, passionate, and collaborative professionals forms the backbone of any successful venture, ensuring that ideas are transformed into impactful solutions. By fostering an environment where talent can thrive – through mentorship, continuous learning, and shared vision – organisations empower their teams to tackle complex challenges with confidence and creativity. After all, even the most groundbreaking technologies are only as powerful as the minds and hands that create and refine them.

Conclusion: Turning Vision into Reality

The Storms River Bridge stands as a symbol of human achievement, blending design foresight with engineering excellence. It teaches us that innovation requires foresight, collaboration, and meticulous execution. Similarly, building a successful data estate solution is not just about connecting systems or transforming data – it’s about creating a seamless convergence where insights meet business needs. By adopting a Top-Down and Bottom-Up approach, teams can navigate the complexities of data projects, aligning technical execution with business needs.

When the two streams meet – when your transformed data delivers perfectly to your reporting requirements – you’ll experience your own epiphany moment of euphoria. It’s a testament to the power of collaboration, innovation, and relentless dedication to excellence.

In both engineering and technology, the most inspiring achievements stem from the ability to transform vision into reality. The story of the Paul Sauer Bridge teaches us that innovation requires foresight, collaboration, and meticulous execution. Similarly, building a successful data estate solution is not just about connecting systems or transforming data, it’s about creating a seamless convergence where insights meet business needs.

The journey isn’t always smooth. Challenges like incomplete data, shifting requirements, or unforeseen obstacles can test our resilience. However, these moments are an opportunity to grow, recalibrate, and innovate further. By adopting structured practices, fostering collaboration, and investing in the right people, organizations can navigate these challenges effectively.

Ultimately, the epiphany moment in data estate development is not just about achieving alignment, it’s about the collective people effort, learning, and perseverance that make it possible. With a clear vision, a strong foundation, and a committed team, you can create solutions that drive success and innovation, ensuring that every challenge becomes a stepping stone toward greater triumphs.

Unveiling the Magic of Data Warehousing: Understanding Dimensions, Facts, Warehouse Schemas and Analytics

Data has emerged as the most valuable asset for businesses. As companies gather vast amounts of data from various sources, the need for efficient storage, organisation, and analysis becomes paramount. This is where data warehouses come into play, acting as the backbone of advanced analytics and reporting. In this blog post, we’ll unravel the mystery behind data warehouses and explore the crucial roles played by dimensions and facts in organising data for insightful analytics and reporting.

Understanding Data Warehousing

At its core, a data warehouse is a specialised database optimised for the analysis and reporting of vast amounts of data. Unlike transactional databases, which are designed for quick data insertion and retrieval, data warehouses are tailored for complex queries and aggregations, making them ideal for business intelligence tasks.

Dimensions and Facts: The Building Blocks of Data Warehousing

To comprehend how data warehouses function, it’s essential to grasp the concepts of dimensions and facts. In the realm of data warehousing, a dimension is a descriptive attribute, often used for slicing and dicing the data. Dimensions are the categorical information that provides context to the data. For instance, in a sales context, dimensions could include products, customers, time, and geographic locations.

On the other hand, a fact is a numeric metric or measure that businesses want to analyse. It represents the data that needs to be aggregated, such as sales revenue, quantity sold, or profit margins. Facts are generally stored in the form of a numerical value and are surrounded by dimensions, giving them meaning and relevance.

The Role of Dimensions:

Dimensions act as the entry points to data warehouses, offering various perspectives for analysis. For instance, by analysing sales data, a business can gain insights into which products are popular in specific regions, which customer segments contribute the most revenue, or how sales performance varies over different time periods. Dimensions provide the necessary context to these analyses, making them more meaningful and actionable.

The Significance of Facts:

Facts, on the other hand, serve as the heartbeat of data warehouses. They encapsulate the key performance indicators (KPIs) that businesses track. Whether it’s total sales, customer engagement metrics, or inventory levels, facts provide the quantitative data that powers decision-making processes. By analysing facts over different dimensions, businesses can uncover trends, identify patterns, and make informed decisions to enhance their strategies.

Facts relating to Dimensions:

The relationship between facts and dimensions is often described as a fact table surrounded by one or more dimension tables. The fact table contains the measures or facts of interest, while the dimension tables contain the attributes or dimensions that provide context to the facts.

Ordering Data for Analytics and Reporting

Dimensions and facts work in harmony within data warehouses, allowing businesses to organise and store data in a way that is optimised for analytics and reporting. When data is organised using dimensions and facts, it becomes easier to create complex queries, generate meaningful reports, and derive valuable insights. Analysts can drill down into specific dimensions, compare different facts, and visualise data trends, enabling data-driven decision-making at all levels of the organisation.

Data Warehouse Schemas

Data warehouse schemas are essential blueprints that define how data is organised, stored, and accessed in a data warehouse. Each schema has its unique way of structuring data, catering to specific business requirements. Here, we’ll explore three common types of data warehouse schemas—star schema, snowflake schema, and galaxy schema—along with their uses, advantages, and disadvantages.

1. Star Schema:

Use:

  • Star schema is the simplest and most common type of data warehouse schema.
  • It consists of one or more fact tables referencing any number of dimension tables.
  • Fact tables store the quantitative data (facts), and dimension tables store descriptive data (dimensions).
  • Star schema is ideal for business scenarios where queries mainly focus on aggregations of data, such as summing sales by region or time.

Pros:

  • Simplicity: Star schema is straightforward and easy to understand and implement.
  • Performance: Due to its denormalised structure, queries generally perform well as there is minimal need for joining tables.
  • Flexibility: New dimensions can be added without altering existing structures, ensuring flexibility for future expansions.

Cons:

  • Redundancy: Denormalisation can lead to some data redundancy, which might impact storage efficiency.
  • Maintenance: While it’s easy to understand, maintaining data integrity can become challenging, especially if not properly managed.

2. Snowflake Schema:

Use:

  • Snowflake schema is an extension of the star schema, where dimension tables are normalised into multiple related tables.
  • This schema is suitable for situations where there is a need to save storage space and reduce data redundancy.
  • Snowflake schema is often chosen when dealing with hierarchical data or when integrating with existing normalised databases.

Pros:

  • Normalised Data: Reducing redundancy leads to a more normalised database, saving storage space.
  • Easier Maintenance: Updates and modifications in normalised tables are easier to manage without risking data anomalies.

Cons:

  • Complexity: Snowflake schema can be more complex to understand and design due to the increased number of related tables.
  • Performance: Query performance can be impacted due to the need for joining more tables compared to the star schema.

3. Galaxy Schema (Fact Constellation):

Use:

  • Galaxy schema, also known as fact constellation, involves multiple fact tables that share dimension tables.
  • This schema is suitable for complex business scenarios where different business processes have their own fact tables but share common dimensions.
  • Galaxy schema accommodates businesses with diverse operations and analytics needs.

Pros:

  • Flexibility: Allows for a high degree of flexibility in modelling complex business processes.
  • Comprehensive Analysis: Enables comprehensive analysis across various business processes without redundancy in dimension tables.

Cons:

  • Complex Queries: Writing complex queries involving multiple fact tables can be challenging and might affect performance.
  • Maintenance: Requires careful maintenance and data integrity checks, especially with shared dimensions.

Conclusion

Data warehousing, with its dimensions and facts, revolutionises the way businesses harness the power of data. By structuring and organising data in a meaningful manner, businesses can unlock the true potential of their information, paving the way for smarter strategies, improved operations, and enhanced customer experiences. As we move further into the era of data-driven decision-making, understanding the nuances of data warehousing and its components will undoubtedly remain a key differentiator for successful businesses in the digital age.

The choice of a data warehouse schema depends on the specific requirements of the business. The star schema offers simplicity and excellent query performance but may have some redundancy. The snowflake schema reduces redundancy and saves storage space but can be more complex to manage. The galaxy schema provides flexibility for businesses with diverse needs but requires careful maintenance. Understanding the use cases, advantages, and disadvantages of each schema is crucial for data architects and analysts to make informed decisions when designing a data warehouse tailored to the unique demands of their organisation.