Tuesday, September 1, 2015

Data Warehouse Concepts

Paths can be many but Destination is One. With this I mean to say, Different people have different explanations for a Data warehouse. But all the explanations and definitions aims to single destination i.e. Data storage. If you are thinking of aligning your career in BI/DW (Business Intelligence/Data warehousing), than you definitely need to know two names. I call these two persons as “Gods of BIDW”.
First name is Ralph Kimball and second name is Bill Inmon. You must be wondering why we need to know about these two personalities. This is because they have given some important Data warehouse concepts which need to be followed.
  • Ralph Kimball provided a “Bottom – Up approach”, whereas
  • Bill Inmon provided “Top – Down approach”.
In the world of BIDW, the most popular definition came from Bill Inmon, who said “A Data warehouse is a Subject-oriented, Integrated, Time-variant and Non-volatile collection of Data in support of Management’s decision making process”.
  1. Subject-Oriented : It means a Data warehouse can be used to analyze a specific subject area. For example – In an Organization, Subject area can be HR, Finance, Sales, etc. So, A Data warehouse can be used to analyze particularly HR or Sales.
  2. Integrated : With Integrated we mean data from heterogeneous data sources is integrated in Data warehouse. For example – Source A stores date in different format and source B stores date in other format, but in a Data warehouse, dates will be stored in a single format only.
  3. Time-Variant : Data warehouse only deals with historical data. It acts as a pit where Transactional data is dumped every day. We can retrieve data which is 6 months, 12 months old or even more than this from a data warehouse. For example – You hold a SBI bank account and updated your contact details. So, Transaction system can only provide you the current contact details whereas Data warehouse holds all the contact details associated with your account (Current and Previous both).
  4. Non-volatile : As explained above, Data warehouse is freakily mad about historical data. Once data comes in the data warehouse, it will not alter. Hence, historical data in a data warehouse is Non-volatile.
Another concept came from Ralph Kimball who explained Data warehouse in his own way and said “A data warehouse is a copy of transactional data specifically structured for query and analysis”.
This is a more like a functional view of a Data warehouse. Bill Inmon precisely stated how the Data warehouse is built which was lacking in Ralph Kimball’s definition.
Data warehouse usages –
  • Trend analyzing for an organization.
  • Startegic planning.
  • Product forecasting.
  • Designing business models.
  • Preparing dashboards and reports with reporting tools.
Summary – In this post, we have learnt
  • What do we mean by Data warehouse.
  • Definition provided by Bill Inmon.
  • Definition provided by Ralph Kimball.
  • How Transactional database is different from Data warehouse.
  • Data warehouse usages.

Introduction to Data warehousing

Have you ever thought why GoogleFacebookTwitter and many other websites are position at the top rank? Have you ever thought how Wall mart, Reliance and other shopping marts are making profits? Have you ever analyzed the ways ebay, flipkart, Snapdeal and other online stores follows to achieve higher revenue each year? There is a single answer for all these questions and it is Data warehousing and Business Intelligence. With this you must get to know the power of this word “DATA”. Data warehousing provides the enterprise with Memory and Data Mining provides the enterprise with Intelligence.
“Data warehousing is a concept and it provides an architectural model for the flow of data from Operational data System to Decision support system.”
Data utilization
Earlier, we were dependent only on Transactional Processing System (TPS) also known as Real-time processing. TPS was used for business transactions to collect daily transactional data. All day-to-day transactions were captured and stored into the database. Then, the point of concern for Business people or Decision makers was to have high network bandwidth to send data to distributed platforms on daily basis and Business Intelligence was given low priority. But until what time can the need of Business intelligence and Data warehousing be ignored.
In 1970’s, Business experts sensed the need of Data warehousing and Business Intelligence and that sense has now become the essence of each organization. With the analysis of the data stored at the Data warehouse, Decision makers can view the immediate impact on the bottom line and top lineof the organization. Also, it provides greater insight of customers which helps the organization to retain them and gives better visibility of the organization.

Need for Data warehousing

Today’s mantra of success is “Proactive use of Information than Optimization”. This is a world of competition and organizations start focusing on utilizing data they collects from day-to-day transactions. Organizations realize the Importance of information they hold, to improve decision making and achieve profits. But, Operational systems were never designed to perform these intelligent activities and probably never can be. So, this arouses a need of such a system which is mature enough to support decision makers and analyze business process. As said by Plato (a Greek philosopher) – “Necessity is the mother of Invention”, this need leads to the development of a system known as Decision Support System (DSS) i.e. Data warehouse. Let’s gain more information on these two systems i.e. TPS and DSS.
Transaction-processing systems are built to capture day to day information and update that collectected information quickly. The TPS is constantly changing and is generally online 24 hours a day.
Transaction Processing System Examples – In shopping outlets, Point-of-sales scans your order and generates bill, Railway reservation applications, Order placing at online stores like flipkart, myntra, etc.
Decision-support systems are built to analyze the data stored in data warehouse. It deals with historical data and helps in extracting the information easily and quickly.
Decision Support System Examples – Shopping outlets analyze which product is to be kept with which product so as to gain more sales and generate maximum revenue, Information for marketing, Claiming Insurance, LIC policies.
Decision makers in Data warehousing

Problem with Data warehousing

You must have read a line from a great poet, William Shakespeare – “Water, water, everywhere nor any drop to drink”. It means in sea there is water all around but you cannot drink that saline water. In the same way, Data warehouse contains volume of data but part of it is important to you that you have to look.
  • I can’t find the data I need – Data is scattered over the network.
  • I can’t get the data I need – Need an expert to get the data.
  • I can’t understand the data I found – Available data is poorly documented.
  • I can’t use the data I found – Data need to be transformed from one form to another.
Apart from these issues, Cost is the major drawback because Data warehousing is a very costly process and requires huge amount of data for Analysis and Reporting processes.

Benefits of Data warehousing

Data warehousing is a concept of merging data from heterogeneous data sources into one database. Organizations spend vast resources in terms of human resource as well as financial resource forimplementing Data warehouse. Why the companies invest so much into this? What is the benefit of spending resources on storing data? Let’s see the benefits provided by Data warehousing to the organization.
Benefits and Advantages of Datawarehousing
  • It facilitates Strategic Planning and considered as a backbone of organizations. Business People get to know market’s trend and can make better decisions.
  • It provides an option of Better Data Analysis as Data from varied sources is grouped into Data marts. Companies can analyze data in a better way by querying on multiple dimensions (Multi Dimensional Query).
  • It provides flexibility in Operation Systems as it operates separately from other operation processes, Data retrieval can be done with ease without affecting other operation requirements.
  • It helps in identifying and resolving dirty data, with dirty data I mean to say Inconsistent data. All the dirty/inconsistent data is resolved and a homogenous data is then grouped into Database. As a result, Data Retrieval and Data Analysis process gets simplified.
  • It helps to improve Decision making process as it provides market trends and forecast.
  • It is used to enhance Customer service and provide better enterprise intelligence.
Free e-books and Pdf’s :-
Please Subscribe or Like PhpRinG Tutorials for free e-books and pdf’s.
  • Launching Very Soon free e-books and pdf’s on Data warehousing.
I hope you like our post on Data warehousing Concepts. Please provide feedback in the form of comments below. Stay tune to PhpRinG Tutorials for more tutorials on Data warehousing in future!!!