|
|
In part 1 of this article series, we described the general structure of a dimensional model. In the present article we shall describe the basic design principles of dimensional modeling. Dimensional modeling follows the four steps defined below. A. Selection of the business process (or processes), the performance of which shall be monitored. Business processes the performance of which is considered critical, and relevant data are sufficient (e.g. operations data derived from these processes), should be selected with priority. The selected business process, may relate to a single organizational unit, or spanning more than one organizational unit.
The capture of overlapping information by different departments which can lead to many versions of truth, is avoided through the capture of a single data stream for an
No tags for this post.
The design principles of the dimensional model, which is commonly used in data warehousing, are described in this article series. Dimensional models capture business performance measurements, which are used to support decision making. Dimensional model The descriptive simplicity and high performance in query execution, are characteristics which have contributed to the increased use of the dimensional model in data warehouse infrastructures. The symmetry and descriptive simplicity can be seen at the conceptual model (see resource link) which relates to retail sales monitoring (data warehousing technology has been introduced initially in retailing).
Relational data models are use to implement the above conceptual model (as depicted in the resource link).
This model is easily understood by Business analysts, in contrast with other operational systems models (
No tags for this post.
Business Intelligence
Business Intelligence has become a very important activity in the business arena irrespective of the domain due to the fact that managers need to analyze comprehensively in order to face the challenges.
Data sourcing, data analysing, extracting the correct information for a given criteria, assessing the risks and finally supporting the decision making process are the main components of BI.
In a business perspective, core stakeholders need to be well aware of all the above stages and be crystal clear on expectations. The person, who is being assigned with the role of Business Analyst (BA) for the BI initiative either from the BI solution providers
No tags for this post.
Data Warehousing was an innovation from the 90′s that promised to change the data landscape for good. How far have we come? Many vendors have entered the marketplace because it makes sense to bring together data from throughout the organization, and this will continue to make sense in the future.
How large the Data Warehouse market will grow nobody knows yet. But for sure it is still growing fast, and currently is estimated at 4,5 billion dollar per year (IDC).
1. Why Do Data Warehouse Projects Run Into Scope Creep?
To quote Bill Inmon (guru and author of several great books on Data Warehousing) “Traditional projects start with requirements and end with data. Data Warehousing projects start with data and end with requirements.” As soon as the project gets under way, users will find new applications, and with it will come new requests for data. Interestingly, these projects often are justified by moving Q&R work away from the ‘data people’. What we’ve seen is that the first thing that happens as soon as the project delivers is that more requests for special queries are submitted to these same ‘data people’. This may appear to undermine the initial business case but actually signals the onset of value creation from the DWH project.
2. Star Schema Versus Entity Relation Model?
There has been enormous debate in the community about the merits of different data models. At the risk of over simplifying: ER models tend to have better performance (processing time) for the end user, and are often perceived as “easier” to understand by end users. Drawbacks are that ER models require more disk space, and, because of the intrinsic redundancy in the data, have consistency problems from a maintenance perspective. Having said this, the practice seems to be that often some combination of the two is unavoidable in the practical setting, despite preferences (ER or Star) of the chief architects. Overall, Star models seem to have gained the most ground.
3. The Importance of a Data Warehouse Business Case
Much has been written about the business case for a Data Warehouse. What goes in to a good business case? IT savings are ubiquitous in DWH business cases. The important point is to not limit this to ‘pure’ savings, but to connect to primary business processes as much as possible. As an example, faster turnaround cycles for list selections are fine (when quantified in hourly rates), but it is even better if the revenue from more customer acquisitions that follow from these selections can be tied in. Not only will the relation to revenue growth rather than savings make for a more balanced business case, more important is the intrinsic business buy-in that results from a direct connection to the company bottom line. These days, changes in legislation (in particular Sarbanes-Oxley) play a major role in justifying business cases. This may be either through a higher company valuation for its transparent information gathering, or, less sleepless night for the CEO, which is of course priceless…
4. Why Do Data Warehouse Projects ‘Never’ Go Wrong?
Actually, Data Warehouse projects do sometimes fail. But, they fail so rarely, that it is actually very hard to believe… Especially after having talked to so many disgruntled end-users. And there are many ways a Data Warehouse project can go wrong. Delivering on time, data administration issues, and unavoidable data quality issues in feeding systems. Corporate politics (see Tip 7) are probably the best explanation for this phenomenon of near 100% success rates on DWH projects. In my experience, the reason why a failure or ‘semi-failure’ can go unnoticed is either because senior management is not aware, or, let’s say “unmotivated” to talk about misspending of company funds. As a result, not enough is learned. Maybe we as consultants have a stake in this as well, as this assures the industry plenty of ongoing business… J
5. What is Different About Warehousing Web Data?
Kimball & Merz (2000): “Although this clickstream data in many cases is raw and unvarnished, it has the potential of providing unprecedented detail about every gesture made by every human being using the Web medium”. The subatomic nature of clickstream data poses unique challenges. There are fewer built in feedback mechanisms to ensure data quality, compared to other data streams. The relation between user mouse clicks and server log records is not as tight as in “traditional” transaction processing due to technical issues like proxy servers and caching. Because of these differences, IT people need to adapt to the web process flow, rather than having the process adapt to IT needs as is common for most other DWH interfaces.
6. Which Data Should Be loaded In The Data Warehouse?
The data that enter the DWH ultimately determine its place in the organization. A “let’s load all data, to be safe”-attitude is a sure fire way to derail your DWH project. Choices as to what should and should not be included need to be made early on, to keep the project manageable. After proven success of the delivered, deployed, and profitably exploited DWH, there always will be funding somewhere to include previously ignored interfaces. Given the anticipated lifecycle of the DWH, it makes perfect sense to consciously exclude certain sources. The choice as to what data to include needs to be driven by business considerations, and in particular reference to the company bottom line. If it can’t be shown how data will be put to use profitably, they stay out! See also tip #3.
7. Data Warehousing & Company Politics
Data Warehouses have an impact on the company bottom line. Hence, they are likely candidates for turf battles, and are also at risk of becoming “small change” in budget allocation negotiations. None of these considerations benefit corporate long term goals. Managing a DWH project is hard enough as it is, and budget issues shouldn’t make it any harder than it already is. Because DWH investments are in the present and revenues lie in the future, it is even more important to secure funding through a sound business case and buy-in from the appropriate (high) management level. See also Tip #3. Access to data means power, and talking about power is one of the greatest management taboos, still around. Sensitive as they are, even budgets are more readily discussed…
8. Data Warehouse Projects Traps
Some commonly recurring ‘roadblocks’ on the path to timely delivery of a Data Warehouse project:
ETL processes have eaten up so much time (and still need “babysitters”), that little if any time is left to develop applications needed to exploit the DWH
Some data are needed, but turn out not to be unavailable, or not in a timely fashion
Maintenance required for tuning, indexing, and backup and recovery is severely underestimated
Different ways of calculating the same phenomenon lead to different results, and nobody is able to conclusively explain the difference(s)
The data that is loaded (and recombined) turn out to contain previously unknown inconsistencies in the source systems, the ‘classic’ data quality issues that trip DWH projects
Metadata were lacking, and developers spend inordinate amounts of time finding out what a field really ‘means’
9. DWH Hardware and Software Go Hand in Hand
In Data Warehousing, it is not about hardware, and not about software: it is about the perfect integration of these two . Those who begin their project from either end, will pay dearly for this mistake. Reasons are:
No tags for this post.
Data warehousing helps to provide information on the techniques involved in designing, building, maintaining and retrieving information, from a data warehouse. A data warehouse is premeditated and produced to support the decision-making process in an organization. The data that is obtained from the production databases are copied in the data warehouse, so that queries can be answered, without hindering the consistency of the production systems.
Data warehousing includes a set of important, new concepts and tools that have evolved into a technology. This makes it possible to counter the problems involved in providing all the key information, to the concerned people.
This field has evolved from the incorporation of a number of experiences and technologies, over the last two decades. Data warehousing is a well-organized and resourceful method of managing and reporting data from a variety of sources, non-uniform and scattered, throughout the company.
Data warehouses are vast, due to the hundreds of gigabytes of transactions. As a result, subsets, known as
No tags for this post.
Data Warehousing was an innovation from the 90′s that promised to change the data landscape for good. How far have we come? Many vendors have entered the marketplace because it makes sense to bring together data from throughout the organization, and this will continue to make sense in the future.
How large the Data Warehouse market will grow nobody knows yet. But for sure it is still growing fast, and currently is estimated at 4,5 billion dollar per year (IDC).
1. Why Do Data Warehouse Projects Run Into Scope Creep?
To quote Bill Inmon (guru and author of several great books on Data Warehousing) “Traditional projects start with requirements and end with data. Data Warehousing projects start with data and end with requirements.” As soon as the project gets under way, users will find new applications, and with it will come new requests for data. Interestingly, these projects often are justified by moving Q&R work away from the ‘data people’. What we’ve seen is that the first thing that happens as soon as the project delivers is that more requests for special queries are submitted to these same ‘data people’. This may appear to undermine the initial business case but actually signals the onset of value creation from the DWH project.
2. Star Schema Versus Entity Relation Model?
There has been enormous debate in the community about the merits of different data models. At the risk of over simplifying: ER models tend to have better performance (processing time) for the end user, and are often perceived as “easier” to understand by end users. Drawbacks are that ER models require more disk space, and, because of the intrinsic redundancy in the data, have consistency problems from a maintenance perspective. Having said this, the practice seems to be that often some combination of the two is unavoidable in the practical setting, despite preferences (ER or Star) of the chief architects. Overall, Star models seem to have gained the most ground.
3. The Importance of a Data Warehouse Business Case
Much has been written about the business case for a Data Warehouse. What goes in to a good business case? IT savings are ubiquitous in DWH business cases. The important point is to not limit this to ‘pure’ savings, but to connect to primary business processes as much as possible. As an example, faster turnaround cycles for list selections are fine (when quantified in hourly rates), but it is even better if the revenue from more customer acquisitions that follow from these selections can be tied in. Not only will the relation to revenue growth rather than savings make for a more balanced business case, more important is the intrinsic business buy-in that results from a direct connection to the company bottom line. These days, changes in legislation (in particular Sarbanes-Oxley) play a major role in justifying business cases. This may be either through a higher company valuation for its transparent information gathering, or, less sleepless night for the CEO, which is of course priceless…
4. Why Do Data Warehouse Projects ‘Never’ Go Wrong?
Actually, Data Warehouse projects do sometimes fail. But, they fail so rarely, that it is actually very hard to believe… Especially after having talked to so many disgruntled end-users. And there are many ways a Data Warehouse project can go wrong. Delivering on time, data administration issues, and unavoidable data quality issues in feeding systems. Corporate politics (see Tip 7) are probably the best explanation for this phenomenon of near 100% success rates on DWH projects. In my experience, the reason why a failure or ‘semi-failure’ can go unnoticed is either because senior management is not aware, or, let’s say “unmotivated” to talk about misspending of company funds. As a result, not enough is learned. Maybe we as consultants have a stake in this as well, as this assures the industry plenty of ongoing business… J
5. What is Different About Warehousing Web Data?
Kimball & Merz (2000): “Although this clickstream data in many cases is raw and unvarnished, it has the potential of providing unprecedented detail about every gesture made by every human being using the Web medium”. The subatomic nature of clickstream data poses unique challenges. There are fewer built in feedback mechanisms to ensure data quality, compared to other data streams. The relation between user mouse clicks and server log records is not as tight as in “traditional” transaction processing due to technical issues like proxy servers and caching. Because of these differences, IT people need to adapt to the web process flow, rather than having the process adapt to IT needs as is common for most other DWH interfaces.
6. Which Data Should Be loaded In The Data Warehouse?
The data that enter the DWH ultimately determine its place in the organization. A “let’s load all data, to be safe”-attitude is a sure fire way to derail your DWH project. Choices as to what should and should not be included need to be made early on, to keep the project manageable. After proven success of the delivered, deployed, and profitably exploited DWH, there always will be funding somewhere to include previously ignored interfaces. Given the anticipated lifecycle of the DWH, it makes perfect sense to consciously exclude certain sources. The choice as to what data to include needs to be driven by business considerations, and in particular reference to the company bottom line. If it can’t be shown how data will be put to use profitably, they stay out! See also tip #3.
7. Data Warehousing & Company Politics
Data Warehouses have an impact on the company bottom line. Hence, they are likely candidates for turf battles, and are also at risk of becoming “small change” in budget allocation negotiations. None of these considerations benefit corporate long term goals. Managing a DWH project is hard enough as it is, and budget issues shouldn’t make it any harder than it already is. Because DWH investments are in the present and revenues lie in the future, it is even more important to secure funding through a sound business case and buy-in from the appropriate (high) management level. See also Tip #3. Access to data means power, and talking about power is one of the greatest management taboos, still around. Sensitive as they are, even budgets are more readily discussed…
8. Data Warehouse Projects Traps
Some commonly recurring ‘roadblocks’ on the path to timely delivery of a Data Warehouse project:
ETL processes have eaten up so much time (and still need “babysitters”), that little if any time is left to develop applications needed to exploit the DWH
Some data are needed, but turn out not to be unavailable, or not in a timely fashion
Maintenance required for tuning, indexing, and backup and recovery is severely underestimated
Different ways of calculating the same phenomenon lead to different results, and nobody is able to conclusively explain the difference(s)
The data that is loaded (and recombined) turn out to contain previously unknown inconsistencies in the source systems, the ‘classic’ data quality issues that trip DWH projects
Metadata were lacking, and developers spend inordinate amounts of time finding out what a field really ‘means’
9. DWH Hardware and Software Go Hand in Hand
In Data Warehousing, it is not about hardware, and not about software: it is about the perfect integration of these two . Those who begin their project from either end, will pay dearly for this mistake. Reasons are:
No tags for this post.
Ever since data warehousing is being used as a facilitator for strategic decision making, the importance of the quality of the underlying data has grown many folds. Data quality issues are much like the software quality issues. They both can sabotage the project at any stage.
This being my first article ever, is more of a loud thinking than a definitive set of steps. In subsequent articles I will discuss data quality issues more in depth.
1. Data collection process:
Many organizations depend on the ETL tools available in the market to make their transactional data ready for OLAP. These tools would be much more effective if the data coming from the day to day used systems is having valid contents. So the data quality checks should be applied right from the data collection process.
For example we see that in case of feedback collection where users write ad-hoc feedback for the open ended questions. To ensure valid feedbacks are registered, techniques ranging from parsing feedback text for some keywords to complex text mining algorithms are employed. More efficient techniques of data quality checking will offload data quality burden from subsequent stages of the DW projects.
According to me there are many separate aspects of looking at data collection. One way to look at it is implicit data collection and explicit data collection. For example, data collected at the server, proxy or client level for tracking user’s browsing behavior will have to be treated separately while preparing it for mining in comparison to data collected through data entry forms.
However proactively taken steps to ensure that valid content gets into the databases would be useful in either case (e.g. In explicit form, it could be string pattern matching tasks like validating the email addresses pattern using which we may not allow the form to be submitted or in case of implicit data collection we need to distinguish between actual user clicks and a bot or a scraping program clicking links on your web pages automatically).
2. Data cleansing process.
Data cleansing is a difficult process due to sheer size of the source data. It is not easy to pick out the badly behaving data from a collection of few terabytes of data. The techniques used here are many ranging from fuzzy matching, custom de-duplication algorithms, and script based custom transforms.
The best approach is studying the source data model and building basic rules for the checking of data quality. This can also be done iteratively. In many cases clients do not provide data upfront but data model only with trial data. The BA and domain expert can with mutual consultation come up with certain rules as to how the actual data should be. These rules may not be very detailed but that is ok as this is just a first iteration. As the understanding of the source data model evolves, so can the data quality rules. (This might sound almost heavenly to anyone who has been a part even a single data warehousing project but it is an approach worth trying.)
Please note that this is different from data profling tools which run on source data. We are trying to analyze metadata and the project requirements so as to specify the data quality.
Generally building this rule requires the sound knowledge of the industry concerned and also the consistent and in-sync data dictionary but the worse part is once these rules are built; data modeling team also has to carry out the actual data verification against these rules manually. This process being cumbersome and error prone might compromise on data quality. We will discuss more about how can this be reduced and possibly automated in the next article.
Article Source: http://EzineArticles.com/?expert=Swanand_Deodhar
No tags for this post.
SAP BW is a continuous data warehousing solution that uses former SAP technologies. This SAP BW is built on the Basis of 3-tier architecture and coded in the ABAP (Advanced Business Application Programming) language. This 3-tier architecture and code language uses ALE (Application Link Enabling) and BAPI (Business Application Programming Interface) to link BW with SAP systems and non-SAP systems.
BW Architecture The BW has three layers in it. The top layer is the reporting layer. This top layer may be BW Business Explorer (BEx) or a third-party reporting device. This BEx consists of two components: one is BEx Analyzer and other is BEx Browser.
BW Server is a middle layer that carries out three tasks: it administrates the BW system, stores data and retrieves the data. In The bottom layer it consists of source systems, which may be R/3 systems, BW systems, flat files, and other systems. In the source systems a SAP component called Plug-In must be installed. It contains extractors. An extractor is a set of ABAP programs, database tables, and other objects that BW uses, which helps to extract data from the SAP systems. This BW Server contain Administrator Workbench, Metadata Repository and Metadata Manager, Staging Engine, PSA, ODS and User Roles.
This Administrator Workbench checks metadata and all BW objects. It has two components: one is BW Scheduler and other is BW Monitor. This component helps to load data and to monitor the data.
This Metadata Repository contains information relating to data warehouse. Metadata Repository contains two types one is business-related and other is technical. Metadata Manager is used to maintain Metadata Repository.
PSA (Persistent Staging Area) is also a BW server. This PSA stores data in the original format while being imported from the source system. It ensures quality check of data before they are loaded in their destinations, such as ODS Objects or Info Cubes.
This ODS (Operational Data Store) Objects helps to build a multilayer structure for operational data reporting. It is used for detail reporting.
Info Cubes is an actual table and they are the associated dimension tables in a star schema.
The OLAP Processor is the analytical processing engine. It analyzes and retrieves data as per users’ requests.
Documents are stored in BDS (Business Document Services). The documents can appear in different formats like Microsoft Word, Excel, PowerPoint, PDF, and HTML.
BW Business Content The BW’s most powerful selling is Business Content. It contains standard reports and other associated objects. For standard reports, BW use a function called Generic Data Extraction. This function is used to extract R/3 data.
Nowadays, BW is rapidly evolving. It helps to plan BW projects and their scopes.
This sap e-business consists of three components: they are my SAP Technology, my SAP Services and my SAP Hosted Solutions.
MySAP Technology provides an infrastructure for Web Application Server and for process-centric collaboration. This infrastructure contains a component called mySAP Business Intelligence.
Another type of services called mySAP Services are the best services which support SAP offers to the customers. They offer for business analysis, technology implementation, and training to system support.
This mySAP Hosted Solutions are the outsourcing services of SAP. With the help of this solution, customers need not want to maintain physical machines and networks.
Ron Victor is a SEO copywriter for SAP Company
. He written many articles in various topics in SAP Articles
and SAP Training. For more information about sap information visit SAP Forums
.Contact him at ron.seocopywriter@gmail.com
Article Source: http://EzineArticles.com/?expert=Ron_Victor
No tags for this post.
The basic criteria for the formation of data warehousing is to scrutinize data from diverse sources while taking decision-making. The objective of data warehousing is to scrutinize data from diverse sources to support decision making.
Poor Performance
Generally, data warehouse refers to storage of numerous data in it. It is not a quick process to retrieve data from the data warehouse for analysis purposes. To scrutinize the data, this data warehouse design uses a special process known as star schema. One finds difficult while extracting, transferring, transforming and loading data from diverse sources into a data warehouse. The Data must be analyzed properly before using. For the failures of many data warehousing projects ETTL has been frequently quoted. You also realize the problem if you started analyzing the data without SAP BW.
Today in large companies, SAP R/3 is used for management purposes. This SAP R/3 which is an enterprise resources planning system which is used business purposes. Before the introduction of the SAP BW, SAP R/3 was mainly used by the data warehouse. SAP BW is a business information warehouse which explains the needs of the business development. To mark your success in the market competition, you have to complete with the up-to-date development of your business environment. These challenges enable the business to take an appropriate decision with the available data. This decision enables you to make many successes in your business life.
Basic Concept of Data Warehousing
Data warehouse is a system, which contains its own database. This data warehouse collects data from various sources and it is planned to support while research has been taken place. For the purpose of analytical processing a special database technique called star schema is used.
Star Schema
Star schema is a new technique that is used by the business people for few years. This star schema has various concepts in it. Star schema is used for the data base by the following method.
This star schema is a graphical version. The name has been derived since it looks like a star. The table looks like a several dimension table. Since the table is very large, it is measured in the gigabyte. This table ensure with enormous useful data. The dimension table size will amount to 1 to 5 percent of the actual table size. This table does not require any normalization.
ETTL
No tags for this post.
In the business setting, there has been an age-old saying about how accomplishing any business goal would entail the selection of the proper tools. Indeed, this is something that holds true, and not just in the business setting but in other facets of life as well. However, sticking with the corporate scenario, just how then can you make sure that you have the right mix of tools when it comes to data warehousing? Harvesting, gathering, and collating data would be inevitable in any organization; for how can a business attain success without collecting and interpreting relevant data? The data collected and analyzed here would be used to come up with the right business decisions, after all. It is then obvious that there is the need to incorporate business intelligence for data warehousing.
The typical data warehousing system would be inevitably complicated, in the sense that there are cross-departmental connections to consider. Any business would be comprised of several departments and units all interacting with each other to achieve common corporate goals and objectives. With the involvement of multiple departments, an efficient data warehousing system should then be implemented.
To do this, there are several considerations that you need to keep in mind. One of the important questions that you need to ask yourself is whether you should BUILD your own business intelligence tool or you should BUY one from the market. Each has its own pros and cons, which you will have to weigh one by one so that you come up with a more informed decision regarding the matter.
When it comes to cost, of course building your own tool would be more advantageous than purchasing one from the market. Building your own tool can significantly cut down cost something you can never be too sure to achieve if you do decide to buy your tool from the market. That is, unless you really know your way inside and outside the market, and this entails a lot of connections with a lot of “right” people, so to speak.
But if you are talking about implementation time, then the purchased business intelligence tool would be more advantageous. This is simply because the tool, being out in the market already, has already been developed and glitches have been attended to as well. Well, there might be some glitches and bugs that you might discover along the way, but at least it would not take too long a time to implement the tool itself because it is already available. All it needs is for its settings to be configured and you should have it up and running in no time.
Documentation and functionality are also advantages that purchasing has on its side of the fence. These are readily available because, as mentioned above, the tool is just waiting to be configured for it to be up and running.
But one huge advantage that building has over purchasing is that when you build your own business intelligence for data warehousing tool, you can tailor this to fit the needs of your business itself. And because of this, your reliance on 3rd-party vendors would also be significantly reduced, even wiped out! Thus, make sure to keep all of these factors in mind when you are still deciding how to get your tool up and running.
If you are interested in Business Intelligence for Data Warehousing
, check this web-site to learn more.
Article Source: http://EzineArticles.com/?expert=Sam_Miller
No tags for this post.
|
|