Unlocking the Power of Data: Transforming Business with the Common Data Model

Common Data Model (CDM) at the heart of the Data Lakehouse

Imagine you’re at the helm of a global enterprise, juggling multiple accounting systems, CRMs, and financial consolidation tools like Onestream. The data is flowing in from all directions, but it’s chaotic and inconsistent. Enter the Common Data Model (CDM), a game-changer that brings order to this chaos.

CDM Definition

A Common Data Model (CDM) is like the blueprint for your data architecture. It’s a standardised, modular, and extensible data schema designed to make data interoperability a breeze across different applications and business processes. Think of it as the universal language for your data, defining how data should be structured and understood, making it easier to integrate, share, and analyse.

Key Features of a CDM:
  • Standardisation: Ensures consistent data representation across various systems.
  • Modularity: Allows organisations to use only the relevant parts of the model.
  • Extensibility: Can be tailored to specific business needs or industry requirements.
  • Interoperability: Facilitates data exchange and understanding between different applications and services.
  • Data Integration: Helps merge data from multiple sources for comprehensive analysis.
  • Simplified Analytics: Streamlines data analysis and reporting, generating valuable insights.

The CDM in practise

Let’s delve into how a CDM can revolutionise your business’ data reporting in a global enterprise environment.

Standardised Data Definitions
  • Consistency: A CDM provides a standardised schema for financial data, ensuring uniform definitions and formats across all systems.
  • Uniform Reporting: Standardisation allows for the creation of uniform reports, making data comparison and analysis across different sources straightforward.
Unified Data Architecture
  • Seamless Data Flow: Imagine data flowing effortlessly from your data lake to your data warehouse. A CDM supports this smooth transition, eliminating bottlenecks.
  • Simplified Data Management: Managing data assets becomes simpler across the entire data estate, thanks to the unified framework provided by a CDM.
Data Integration
  • Centralised Data Repository: By mapping data from various systems like Maconomy (accounting), Dynamics (CRM), and Onestream (financial consolidation) into a unified CDM, you establish a centralised data repository.
  • Seamless Data Flow: This integration minimises manual data reconciliation efforts, ensuring smooth data transitions between systems.
Improved Data Quality
  • Data Validation: Enforce data validation rules to reduce errors and inconsistencies.
  • Enhanced Accuracy: Higher data quality leads to more precise financial reports and informed decision-making.
  • Consistency: Standardised data structures maintain consistency across datasets stored in the data lake.
  • Cross-Platform Compatibility: Ensure that data from different systems can be easily combined and used together.
  • Streamlined Processes: Interoperability streamlines processes such as financial consolidation, budgeting, and forecasting.
Extensibility
  • Customisable Models: Extend the CDM to meet specific financial reporting requirements, allowing the finance department to tailor the model to their needs.
  • Scalability: As your enterprise grows, the CDM can scale to include new data sources and systems without significant rework.
Reduced Redundancy
  • MDM eliminates data redundancies, reducing the risk of errors and inconsistencies in financial reporting.
Complements the Enterprise Data Estate
  • A CDM complements a data estate that includes a data lake and a data warehouse, providing a standardised framework for organising and managing data.
Enhanced Analytics
  • Advanced Reporting: Standardised and integrated data allows advanced analytics tools to generate insightful financial reports and dashboards.
  • Predictive Insights: Data analytics can identify trends and provide predictive insights, aiding in strategic financial planning.
Data Cataloguing and Discovery
  • Enhanced Cataloguing: CDM makes it easier to catalogue data within the lake, simplifying data discovery and understanding.
  • Self-Service Access: With a well-defined data model, business users can access and utilise data with minimal technical support.
Enhanced Interoperability
  • CDM facilitates interoperability by providing a common data schema, enabling seamless data exchange and integration across different systems and applications.
Reduced Redundancy and Costs
  • Elimination of Duplicate Efforts: Minimise redundant data processing efforts.
  • Cost Savings: Improved efficiency and data accuracy lead to cost savings in financial reporting and analysis.
Regulatory Compliance
  • Consistency in Reporting: CDM helps maintain consistency in financial reporting, crucial for regulatory compliance.
  • Audit Readiness: Standardised and accurate data simplifies audit preparation and compliance with financial regulations.
Scalability and Flexibility
  • Adaptable Framework: CDM’s extensibility allows it to adapt to new data sources and evolving business requirements without disrupting existing systems.
  • Scalable Solutions: Both the data lake and data warehouse can scale independently while adhering to the CDM, ensuring consistent growth.
Improved Data Utilisation
  • Enhanced Analytics: Apply advanced analytics and machine learning models more effectively with standardised and integrated data.
  • Business Agility: A well-defined CDM enables quick adaptation to changing business needs and faster implementation of new data-driven initiatives.
Improved Decision-Making
  • High-quality, consistent master data enables finance teams to make more informed and accurate decisions.

CDM and the Modern Medallion Architecture Data Lakehouse

In a lakehouse architecture, data is organised into multiple layers or “medals” (bronze, silver, and gold) to enhance data management, processing, and analytics.

  • Bronze Layer (Raw Data): Raw, unprocessed data ingested from various sources.
  • Silver Layer (Cleaned and Refined Data): Data that has been cleaned, transformed, and enriched, suitable for analysis and reporting.
  • Gold Layer (Aggregated and Business-Level Data): Highly refined and aggregated data, designed for specific business use cases and advanced analytics.
CDM in Relation to the Data Lakehouse Silver Layer

A CDM can be likened to the silver layer in a Medallion Architecture. Here’s how they compare:

AspectData Lakehouse – Silver LayerCommon Data Model (CDM)
Purpose and FunctionTransforms, cleans, and enriches data to ensure quality and consistency, preparing it for further analysis and reporting. Removes redundancies and errors found in raw data.Provides standardised schemas, structures, and semantics for data. Ensures data from different sources is represented uniformly for integration and quality.
Data StandardisationImplements transformations and cleaning processes to standardise data formats and values, making data consistent and reliable.Defines standardised data schemas to ensure uniform data structure across the organisation, simplifying data integration and analysis.
Data Quality and ConsistencyFocuses on improving data quality by eliminating errors, duplicates, and inconsistencies through transformation and enrichment processes.Ensures data quality and consistency by enforcing standardised data definitions and validation rules.
InteroperabilityEnhances data interoperability by transforming data into a common format easily consumed by various analytics and reporting tools.Facilitates interoperability with a common data schema for seamless data exchange and integration across different systems and applications.
Role in Data ProcessingActs as an intermediate layer where raw data is processed and refined before moving to the gold layer for final consumption.Serves as a guide during data processing stages to ensure data adheres to predefined standards and structures.

How CDM Complements the Silver Layer

  • Guiding Data Transformation: CDM serves as a blueprint for transformations in the silver layer, ensuring data is cleaned and structured according to standardised schemas.
  • Ensuring Consistency Across Layers: By applying CDM principles, the silver layer maintains consistency in data definitions and formats, making it easier to integrate and utilise data in the gold layer.
  • Facilitating Data Governance: Implementing a CDM alongside the silver layer enhances data governance with clear definitions and standards for data entities, attributes, and relationships.
  • Supporting Interoperability and Integration: With a CDM, the silver layer can integrate data from various sources more effectively, ensuring transformed data is ready for advanced analytics and reporting in the gold layer.

CDM Practical Implementation Steps

By implementing a CDM, a global enterprise can transform its finance department’s data reporting, leading to more efficient operations, better decision-making, and enhanced financial performance.

  1. Data Governance: Establish data governance policies to maintain data quality and integrity. Define roles and responsibilities for managing the CDM and MDM. Implement data stewardship processes to monitor and improve data quality continuously.
  2. Master Data Management (MDM): Implement MDM to maintain a single, consistent, and accurate view of key financial data entities (e.g. customers, products, accounts). Ensure that master data is synchronised across all systems to avoid discrepancies. (Learn more on Master Data Management).
  3. Define the CDM: Develop a comprehensive CDM that includes definitions for all relevant data entities and attributes used across the data estate.
  4. Data Mapping: Map data from various accounting systems, CRMs, and Onestream to the CDM schema. Ensure all relevant financial data points are included and standardised.
  5. Integration with Data Lake Platform & Automated Data Pipelines (Lakehouse): Implement processes to ingest data into the data lake using the CDM, ensuring data is stored in a standardised format. Use an integration platform to automate ETL processes into the CDM, supporting real-time data updates and synchronisation.
  6. Data Consolidation (Data Warehouse): Use ETL processes to transform data from the data lake and consolidate it according to the CDM. Ensure the data consolidation process includes data cleansing and deduplication steps. CDM helps maintain data lineage by clearly defining data transformations and movements from the source to the data warehouse.
  7. Analytics and Reporting Tools: Implement analytics and reporting tools that leverage the standardised data in the CDM. Train finance teams to use these tools effectively to generate insights and reports. Develop dashboards and visualisations to provide real-time financial insights.
  8. Extensibility and Scalability: Extend the CDM to accommodate specific financial reporting requirements and future growth. Ensure that the CDM and MDM frameworks are scalable to integrate new data sources and systems as the enterprise evolves.
  9. Data Security and Compliance: Implement robust data security measures to protect sensitive financial data. Ensure compliance with regulatory requirements by maintaining consistent and accurate financial records.
  10. Continuous Improvement: Regularly review and update the CDM and MDM frameworks to adapt to changing business needs. Solicit feedback from finance teams to identify areas for improvement and implement necessary changes.

By integrating a Common Data Model within the data estate, organisations can achieve a more coherent, efficient, and scalable data architecture, enhancing their ability to derive value from their data assets.

Conclusion

In global enterprise operations, the ability to manage, integrate, and analyse vast amounts of data efficiently is paramount. The Common Data Model (CDM) emerges as a vital tool in achieving this goal, offering a standardised, modular, and extensible framework that enhances data interoperability across various systems and platforms.

By implementing a CDM, organisations can transform their finance departments, ensuring consistent data definitions, seamless data flow, and improved data quality. This transformation leads to more accurate financial reporting, streamlined processes, and better decision-making capabilities. Furthermore, the CDM supports regulatory compliance, reduces redundancy, and fosters advanced analytics, making it an indispensable component of modern data management strategies.

Integrating a CDM within the Medallion Architecture of a data lakehouse further enhances its utility, guiding data transformations, ensuring consistency across layers, and facilitating robust data governance. As organisations continue to grow and adapt to new challenges, the scalability and flexibility of a CDM will allow them to integrate new data sources and systems seamlessly, maintaining a cohesive and efficient data architecture.

Ultimately, the Common Data Model empowers organisations to harness the full potential of their data assets, driving business agility, enhancing operational efficiency, and fostering innovation. By embracing CDM, enterprises can unlock valuable insights, make informed decisions, and stay ahead in an increasingly data-driven world.

Cloud Provider Showdown: Unravelling Data, Analytics and Reporting Services for Medallion Architecture Lakehouse

Cloud Wars: A Deep Dive into Data, Analytics and Reporting Services for Medallion Architecture Lakehouse in AWS, Azure, and GCS

Introduction

Crafting a medallion architecture lakehouse demands precision and foresight. Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) emerge as juggernauts, each offering a rich tapestry of data and reporting services. This blog post delves into the intricacies of these offerings, unravelling the nuances that can influence your decision-making process for constructing a medallion architecture lakehouse that stands the test of time.

1. Understanding Medallion Architecture: Where Lakes and Warehouses Converge

Medallion architecture represents the pinnacle of data integration, harmonising the flexibility of data lakes with the analytical prowess of data warehouses, combined forming a lakehouse. By fusing these components seamlessly, organisations can facilitate efficient storage, processing, and analysis of vast and varied datasets, setting the stage for data-driven decision-making.

The medallion architecture is a data design pattern used to logically organise data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture. The architecture describes a series of data layers that denote the quality of data stored in the lakehouse. It is highly recommended, by Microsoft and Databricks, to take a multi-layered approach to building a single source of truth (golden source) for enterprise data products. This architecture guarantees atomicity, consistency, isolation, and durability as data passes through multiple layers of validations and transformations before being stored in a layout optimised for efficient analytics. The terms bronze (raw), silver (validated), and gold (enriched) describe the quality of the data in each of these layers. It is important to note that this medallion architecture does not replace other dimensional modelling techniques. Schemas and tables within each layer can take on a variety of forms and degrees of normalisation depending on the frequency and nature of data updates and the downstream use cases for the data.

2. Data Services

Amazon Web Services (AWS):

  • Storage:
    • Amazon S3: A scalable object storage service, ideal for storing and retrieving any amount of data.
  • ETL/ELT:
    • AWS Glue: An ETL service that automates the process of discovering, cataloguing, and transforming data.
  • Data Warehousing:
    • Amazon Redshift: A fully managed data warehousing service that makes it simple and cost-effective to analyse all your data using standard SQL and your existing Business Intelligence (BI) tools.

Microsoft Azure:

  • Storage:
    • Azure Blob Storage: A massively scalable object storage for unstructured data.
  • ETL/ELT:
    • Azure Data Factory: A cloud-based data integration service for orchestrating and automating data workflows.
  • Data Warehousing
    • Azure Synapse Analytics (formerly Azure SQL Data Warehouse): Integrates big data and data warehousing. It allows you to analyse both relational and non-relational data at petabyte-scale.

Google Cloud Platform (GCP):

  • Storage:
    • Google Cloud Storage: A unified object storage service with strong consistency and global scalability.
  • ETL/ELT:
    • Cloud Dataflow: A fully managed service for stream and batch processing.
  • Data Warehousing:
    • BigQuery: A fully-managed, serverless, and highly scalable data warehouse that enables super-fast SQL queries using the processing power of Google’s infrastructure.

3 . Analytics

Google Cloud Platform (GCP):

  • Dataproc: A fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters.
  • Dataflow: A fully managed service for stream and batch processing.
  • Bigtable: A NoSQL database service for large analytical and operational workloads.
  • Pub/Sub: A messaging service for event-driven systems and real-time analytics.

Microsoft Azure:

  • Azure Data Lake Analytics: Allows you to run big data analytics and provides integration with Azure Data Lake Storage.
  • Azure HDInsight: A cloud-based service that makes it easy to process big data using popular frameworks like Hadoop, Spark, Hive, and more.
  • Azure Databricks: An Apache Spark-based analytics platform that provides collaborative environment and tools for data scientists, engineers, and analysts.
  • Azure Stream Analytics: Helps in processing and analysing real-time streaming data.
  • Azure Synapse Analytics: An analytics service that brings together big data and data warehousing.

Amazon Web Services (AWS):

  • Amazon EMR (Elastic MapReduce): A cloud-native big data platform, allowing processing of vast amounts of data quickly and cost-effectively across resizable clusters of Amazon EC2 instances.
  • Amazon Kinesis: Helps in real-time processing of streaming data at scale.
  • Amazon Athena: A serverless, interactive analytics service that provides a simplified and flexible way to analyse petabytes of data where it lives in Amazon S3 using standard SQL expressions. 

4. Report Writing Services: Transforming Data into Insights

  • AWS QuickSight: A business intelligence service that allows creating interactive dashboards and reports.
  • Microsoft Power BI: A suite of business analytics tools for analysing data and sharing insights.
  • Google Data Studio: A free and collaborative tool for creating interactive reports and dashboards.

5. Comparison Summary:

  • Storage: All three providers offer reliable and scalable storage solutions. AWS S3, Azure Blob Storage, and GCS provide similar functionalities for storing structured and unstructured data.
  • ETL/ELT: AWS Glue, Azure Data Factory, and Cloud Dataflow offer ETL/ELT capabilities, allowing you to transform and prepare data for analysis.
  • Data Warehousing: Amazon Redshift, Azure Synapse Analytics, and BigQuery are powerful data warehousing solutions that can handle large-scale analytics workloads.
  • Analytics: Azure, AWS, and GCP are leading cloud service providers, each offering a comprehensive suite of analytics services tailored to diverse data processing needs. The choice between them depends on specific project needs, existing infrastructure, and the level of expertise within the development team.
  • Report Writing: QuickSight, Power BI, and Data Studio offer intuitive interfaces for creating interactive reports and dashboards.
  • Integration: AWS, Azure, and GCS services can be integrated within their respective ecosystems, providing seamless connectivity and data flow between different components of the lakehouse architecture. Azure integrates well with other Microsoft services. AWS has a vast ecosystem and supports a wide variety of third-party integrations. GCP is known for its seamless integration with other Google services and tools.
  • Cost: Pricing models vary across providers and services. It’s essential to compare the costs based on your specific usage patterns and requirements. Each provider offers calculators to estimate costs.
  • Ease of Use: All three platforms offer user-friendly interfaces and APIs. The choice often depends on the specific needs of the project and the familiarity of the development team.
  • Scalability: All three platforms provide scalability options, allowing you to scale your resources up or down based on demand.
  • Performance: Performance can vary based on the specific service and configuration. It’s recommended to run benchmarks or tests based on your use case to determine the best-performing platform for your needs.

6. Decision-Making Factors: Integration, Cost, and Expertise

  • Integration: Evaluate how well the services integrate within their respective ecosystems. Seamless integration ensures efficient data flow and interoperability.
  • Cost Analysis: Conduct a detailed analysis of pricing structures based on storage, processing, and data transfer requirements. Consider potential scalability and growth factors in your evaluation.
  • Team Expertise: Assess your team’s proficiency with specific tools. Adequate training resources and community support are crucial for leveraging the full potential of chosen services.

Conclusion: Navigating the Cloud Maze for Medallion Architecture Excellence

Selecting the right combination of data and reporting services for your medallion architecture lakehouse is not a decision to be taken lightly. AWS, Azure, and GCP offer powerful solutions, each tailored to different organisational needs. By comprehensively evaluating your unique requirements against the strengths of these platforms, you can embark on your data management journey with confidence. Stay vigilant, adapt to innovations, and let your data flourish in the cloud – ushering in a new era of data-driven excellence.