BLOG 03 | SUMMARY OF WEEK 2: DATA WAREHOUSE DESIGN CYCLE, DIMENSIONAL MODELING AND BALANCED SCORE CARDS

 Lecture Summary 

Lecture 3: Data warehouse design cycle
This lecture focused mainly on types of organizational data, Data warehouses, Data marts and Data Warehouse Design Process.

Information Assets

Organizational Information can be divided into two major categories:
Operational Data sources – these help an organization with OLTP (Online Transaction Processing)
Data warehouses and Data marts – these help an organization with OLAP (Online Analytical Processing)

OLTP vs OLAP
We looked at the differences between OLTP and OLAP. In a nutshell, OLTP concerns itself more with day-to-day business operational activities like adding new customers, products, or inventory control. While OLAP is more for analytical purposes like orders placed over x number of years, employee performance over time, or other kinds of reports.
Comparing them side by side for better understanding, I found one online that is a little more detailed as compared to our lecture slide.

Definitions
Data Warehouse
A data warehouse is an integrated, enterprise-wide collection of high-quality information from different databases used to support management decision making and performance management.
Data Marts
A data warehouse that addresses a specific functional area such as marketing or finance or human resources or maybe a particular application area, is often called a data mart.

Examples of how various industries use data warehousing
Retail Industry: To understand how fast products are moving, which products are performing well, customer segmentation and targeting
Telecommunications: To understand how to offer various deals to various kinds of customers
Banking and Financial: To understand the performance of a bank

Data Warehouse Design Process
Components Involved:
Data Sources: These are the various data sources that have potentially useful data that we want to use. These can be numerous OLTP data bases, flat files, or any other external data sources.
Extract Transform and Load (ETL) Processes: Before the data from the data source is passed to the data warehouse, it is extracted, cleaned, data is formatted, and linked together. All this happens under the ETL processes.

(Source: week 2 Readings -> Data warehousing chapter 11 (Slide 4))

 

Data Staging Area: After the data is processed and before it is fed to the data warehouse, it is either moved to a temporary storage location called data staging area or its loaded into an (ODS) operational data store from where is goes through the ETL processes again before its sent to the data warehouse.

Data Warehousing Repositories: These refer to data marts that are essentially subsets of the data warehouse used for particular departments or purposes.

BI Tools: These are the tools that analyze and serve the business units with important information that will help them making better business decisions. These include reports, dashboards, OLAP analysis, and also data mining.

 (Source: lecture 3 slide 11)


Lecture 4: Introduction to Balanced Scorecard
This lecture was primarily focused on balanced score cards and their uses. We also looked at how Southwest airlines used the Balanced scorecard to its advantage.

Business Intelligence and Performance Management
Performance measurement and management plays a very important role in Business Intelligence.
To make sound business decisions we need to know what we are our key performance indicators (KPIs) and how should be ne quantifying them. Once we have an idea of what to measure and what to quantify, we can implement a data warehouse that captures and stores the all the important data from our data sources.

The Balanced Scorecard
The Balanced Scorecard is a performance management framework.
It is a comprehensive framework that helps linking organizational goals to the overall strategy by mapping goals to a series of actions needed to achieve them.  It then helps to derive KPIs and metrics that can measure how well the goals are being achieved.

Why Balanced Scorecard
The Balanced Scorecard is a holistic approach to performance management and is preferred to the traditional methods of just using financial methods to measure the performance of an organization. This is because there are numerous non-financial measures that also play an important role in determining an organizations performance.
Furthermore, the Balanced Scorecard, helps in identifying the appropriate KPI metrics from multiple perspectives to gain a better understanding of goals and how well are they being achieved.
A Balanced Score card has the organizations vision and strategy at its center, surrounded by four perspectives. The goals and KPI metrics of each perspective should support the strategy at its center.

The four perspectives of a Balanced Scorecard:
Financial perspective
To succeed financially, how should we appear to our shareholders?
Customer Perspective
To achieve vision, how should we appear to our customers?
Internal Business Process Perspective
To satisfy our shareholders and customers, what business processes must we excel at?
Learning and Growth Perspective
To achieve our vision, how will we sustain our ability to change and improve?

How Southwest Airlines used the Balanced Scorecard
(Summary from both lecture notes and reading material: How Southwest Airlines Developed its Balanced Scorecard Analysis)
Background
We looked at the airline industry, its market analysis (2013) and the issues they face.
We also looked at the facts regrading 10 US based airlines, their profits, and their profit margins.
Then we learned a few airline industry metrics: (RASM) Revenue per Available Seat Mile, (CASM) Cost per Available Seat Mile and understood that for an airline to make profit the RASM -CASM (their difference) should be a positive number.
We also noted that for a long time, the difference between RASM and CASM was not positive and during such scenarios, Southwest airlines were able to make a profit almost every year.

Southwest Airline

Vision: Southwest has always had the philosophy of being a low cost, no frills, airline
Strategy: It uses a point-to-point strategy instead of a hub-and-spoke strategy
Goal: to be an industry leader in on time arrivals while offering customers the lowest possible prices, maintaining healthy levels of profitability and employee loyalty.

Strategy Map
Strategic theme: operating efficiency
The four perspectives:
Financial
From a financial perspective, what will drive operating efficiency?
Goal: Have fewer planes and more customers, that will drive operating efficiency.
Fewer planes implies that you will bring down your major capital costs. More customers means that you will raise your revenues.
* Then they took a survey to better understand their customers and their needs. Which essentially boiled down to lowest prices and timely arrivals.
Customer
From a customer perspective, what will ensure customer satisfaction and increased customer loyalty?
Goal: the strategy should ensure that flights are always on time and offer the lowest competitive prices
Internal Process
How do you make sure that you get more customers on fewer planes? How do you make sure you give them lowest prices and you also make sure that the flight is on time?
Goal: internal focus on fast ground turnaround time.
This meant that they wanted to speed up the process of cleaning, getting people out, and onboarding to improve their ground turnaround time. This will ensure that flights are always on time which in turn will help them achieve more customers with fewer planes.
Learning and growth
how do we make sure that our employees will help us achieve fast ground turnaround time?
Goal: train, educate and compensate the ground crew to optimize their ground turnaround times.

Once the strategy map is ready, the next step in creating a Balanced Scorecard is to setting objectives, measurement metrics, targets and initiative taken to achieve those objectives.

(Source: Course readings week 2 - How Southwest Airlines Developed its Balanced Scorecard Analysis)

Lecture 5: Dimensional Modeling Star Schema Design
In this lecture we went over the four-step process of Dimension Modeling, Facts and dimensions, Star schema design and Data cubes and associated operations.

Four Steps for dimension modeling
1. Select the business process to model
    Understanding the business processes, users, and their needs
2. Declare the grain of the business process
    Grain refers to the level of detail associated with the facts recorded
3. Choose the dimensions that apply to each fact row
    Choosing the right and meaningful dimensions that help business people understand the results from     the business processes
4. Identify the numeric facts
    These are the data that are in each fact row. They represent what we are measuring.

Facts and Dimensions
A star schema consists of two types of tables, Facts and Dimensions.
Fact Tables
They store the performance measurements of a business.
They’re numeric and additive
They’re continuous valued
They have no zero rows
They have foreign keys for referential integrity
These tables are tall and thin
Dimension Tables
They usually store textual attributes and few records
They usually have descriptive, complete, and discretely valued attributes
They capture hierarchical relationships in a business
They’re denormalized
They’re short and fat
They’re the primary source of query constrains, groupings, and report labels.
Star Schema

(Source: week 2 Readings -> Data warehousing chapter 11)

Brings together facts and dimension tables and consists of one to many relationships between dimension and fact tables. The fact table is at the heart of this schema surrounded by dimensions giving it a star like shape, hence the name.

Types of OLAP
ROLAP: Relational OLAP
Done in data warehouses where the facts and dimensions are stored as relational tables.
MOLAP: Multidimensional OLAP
Done in data warehouses where the Star Schema is stored in the form of a multidimensional cube
HOLAP: Hybrid OLAP
Combination of ROLAP and MOLAP

Data Cubes
Data cubes are the matrix representation of multidimensional data and is also another representation of a star schema.
A hyper cube is when a data cube has more than three dimensions.

Data Cube Operations
Slice: comes up with a 2-D view of part of data based on a restricting dimension
Dice: comes up with a subset of dimensions of our data.
Roll up: comes up with aggregate of dimension values to view a larger volume of data in summary (from day to weeks)
Drill down: comes up with a finer granular analysis of dimensional values (from day to hours)
Pivot: comes up with a different perspective by rotating axis.
I found a great visual for the above operations

(Source : https://www.infodiagram.com/slides/operations-olap-cube)

 

Readings Summary

The Balanced Scorecard-Measures that drive performance

This article talks about the balanced scorecard and its four perspectives in depth. It resonates with the four perspectives we have talked about in our lectures, including the questions to identify the KPI and its metrics, the goals and measures of each perspective with strategy at its center. Giving managers the ability to view complex information at a glance and see how results are achieved.
We looked at a couple of examples, one of which being how a company used the balanced score card to include innovation measures to be able to continuously grow and create value. Another example was of a company that used daily financial report to receive strong feedback and used it to recalibrate their strategy to improve quality.
In summation, this article justified with various examples how the balanced score card can provide companies with a holistic tool for meaningful performance measurement that link with the organization’s strategy. 

Using Balanced Scorecard as a Strategic Management System - Kaplan and Norton


This article talks about the evolution of the of strategic management systems and the evolution of the balanced scorecard.
It started with companies recognizing the importance of performance and strategic measurement over just the financial measures and targets, helping them link long term strategic objectives with short term actions.
The article then talks about the new processes that came up that moved beyond the initial scorecard version
The first new process was translating the vision into definitive set of objectives and measures that would be understood throughout the company.
The second new process was communicating and linking, which let managers communicate their strategy throughout the organization and link them to departments and individual objectives.
Third new process was business planning enabling companies to link their business and financial plans.
The fourth new process was feedback and learning, this let the companies use feedbacks and monitor short term results not only from a financial but three additional perspectives to evaluate strategies in real time and to modify them to reflect real time learning.
We then looked at various use cases of how companies used the balanced scorecard as the center of their managing strategy. These use cases included
A time frame tracked progress of how a company built a strategic management system by implementing the four new processes we discussed.
We looked at an example where we looked at a personal scorecard that helped to communicate corporate and unit objectives to the individuals and teams performing the work and how this proved to be a very helpful tool in communicating and measuring performances of individuals and teams. They incentivized performances by linking rewards to performance measures.
Then we looked at how one company linked measures from the four perspectives helping to form a strategically understand, link and reengineer processes that are most critical to the company’s strategic success. This included continuous reviews, feedback and learning.
In summation, the article states how these new processes form a new strategic management system with the balanced scorecard at its center.

Articles


The Future of Data warehousing 
(https://www.sdggroup.com/en/insights-room/future-data-warehousing)
This article by SDG group identifies three significant modalities that they predict will play an important role in shaping the data analytics industry and data management practices.
Let’s take a quick look at them:
1.      Data Mesh
This is a new way to view modern distributed data architecture. Data mesh introduces a decentralized approach to a data warehousing and management. This approach enables any part of the business to generate its data capabilities aligned to its needs.
This solution eliminates the bottlenecks that come with centralized infrastructures, making data readily available to anyone who needs it. This will empower various teams like sales, marketing, IT to work with higher efficiency
2.      Data Fabric
This is an architecture that enables data access and sharing in a distributed environment be it public, private, multi-cloud systems etc. Data Fabric in a data driven strategy, relies on rich metadata, automation processes and AI technologies to deal with data quality issues and other components.
This is essentially a layer of scalable data and connections processes that circumvents data silos by continuously identifying and linking data from disparate applications, discovering unique business relevant relationships.
3.      Data Vaults 2.0
Originating from the cloud and DataOps, data vaults is a collaborative configuration and management of technologies, processes, and data. Adopting this practice for data warehousing, when we have many data sources with frequent addition and changes that need to be added to the data warehouse

Four major data management trends 
(https://rivery.io/blog/data-analytics-trends-2022/)
I think its only fair to include this article in my blog as there is a significant overlap with the one above.
This article talks about the 4 big trends in data management. Let's take a quick look at them:
1.      Data Mesh and Data Fabric
I already discussed these two above. While they’re not exactly the same, both rely on the idea of decentralizing data access by creating an infrastructure where data is treated as a product, and teams can autonomously manage their data analytics processes
2.      Data Lakehouse
Data warehouse and data lakehouse both are used to store big data but there is a difference. While Data warehouses are a home to processed data, data lake is a large pool of raw data for which no use is determined. Putting the responsibilities of processing at the endpoints
3.      AutoML
Automated machine learning involves automatic processing tasks for data preparation saving time and effort taken for ETL processes.
4.      Augmented Analytics
Augmented Analytics is the use of technologies like machine learning and AI to assist with data preparation, insight generation and insight explanations to BI platforms. AutoML will be an important part of Augmented Analytics


Comments