DATE: Jul 15, 2014
AUTHOR: Pranay Pandey
In my last posts on BI Basics and BI Web Portal Basics we saw how to start your journey. In this post you will learn the basics of dimensional modeling and see more details around the concept of a Data Warehouse.
The Data Warehouse serves as the foundation for improved decision making for business users. To do so, it must have accurate data.
We extract the data from the operational source system; transform it through services, flat files, and relational tables; process it; and load into the data presentation area. The data goes into presentation as dimensions, facts, bridge, and star schema. Finally the data is accessed using data access tools like query tools, report writers, analytic applications, etc.
On the whole, data warehouses come into existence to store data in a usable and presentable format. In operational systems we need to traverse many entities to get a small piece of information, but in the data warehouse model we have all the information in one place. This is fact based reporting which gives the business user the confidence to act, and assists in making better decisions.
The industry has concluded that dimensional modeling is the most viable technique for delivering data to data warehouse user. Dimensional modeling is nothing but a new name for an old technique for making databases simple and understandable.
Lets understand the vocabulary of dimensional modeling.
A fact table is the primary table in the dimensional model. It is where the numerical performance measurements of the business are stored. It stores the dimension keys to get the dimension data for each measurement. It is deep in the number of rows and narrow in terms of columns. We use the term fact to represent a business measure. In a dimensional model, every table that expresses a many-to-many relationship must be a fact table. All other tables are dimensional tables.
Dimension tables are integral companions to the fact table. The dimension tables contain the textual descriptors of the business. In a well-designed dimensional model, dimension tables have many columns but few rows. While loading data into dimension tables we always put the calculated values for example; if we get the status as ‘ORDER_APPROVED’ then the dimension table maintains its description as ‘Approved’ so that while reporting using the query or any tool, the user does not need worry about the data’s presentation. The power of the DW is directly proportional to the quality and depth of the dimensional attributes.
Bringing together all the facts and dimensions. The fact table consisting of numeric measurements is joined to the set of dimension tables filled with descriptive columns or attributes which in turn is maintained in the hierarchy by linking the related dimensions in the fact table. The structure of these joins makes a star among the entities to resolve the many-to-many relationship, this is called the star join schema. On this star schema, the user can drag-drop through the reporting tools to include and exclude the dimension fields to generate the report for his/her requirement. This is the final schema where query can be fired through the query tool.
Refer to the image below for a sample stars schema for retail sales.
These are some of the basics of dimensional modeling which can be used to build your own star schemas to design BI solutions for the real world. To read data from the transactional model and fill the dimensional model, we write ETL(Extract, transform and load) services.
To explore this further you can refer to The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by Ralph Kimball, Margy Ross” which has helped us and can help you too in understanding and building solutions for your BI system.
Thanks for taking the time to read this post.