Large amounts of data processed in enterprises should be seen as a source of knowledge needed to make accurate business decisions. The progress in information technology has enabled the automation of business processes in enterprises. The automation allows you to streamline business, reduce costs and thus to maintain the competitiveness of the company in the market. However, to gain it there is the need of processing growing amounts of data. Is collecting and processing of this data only the necessary cost of effectiveness improvement? No, this should not be perceived like this. The data should primarily be seen as a source of knowledge which can be used to make business decisions.
But can transactional systems (OLTP – Online Transaction Processing) such as ERP, CRM, or other of this class serve us to generate reports for analyzing the data processed by them? Of course they can, but this solution is not effective, flexible and does not cover a sufficiently broad spectrum of data. OLTP systems are not optimized for data analysis, which involves some problems. One of them is the performance question. For reports operating on large data sets (and this is often the case, in particular by reporting for tactical and strategic purposes), the execution time of individual queries in the system and generating reports happens to be unacceptably long, it may take up to several hours. The process of reading and analyzing the data has namely to compete with the processes of entry, modification and deletion of data which are the basic processes in the transactional systems. The reporting itself also burdens the system, especially if the reports are generated repeatedly and for many people. Transactional system architecture is not optimized for analytical queries.
Another problem is the heterogeneity of data sourced from different systems. The company often works alongside a number of different IT systems implemented in different times and storing data in different formats. Data can also come from external files or from the Internet. Comparing this data in the analysis without its prior preparation may not be possible.
Another issue is often the inability to fully and dynamically customize reports in OLTP systems, i.e. to compile the data in different configurations depending on current need of the analyst and to increase or reduce the level of particularity in the analysis.
The answer to these problems are the OLAP applications (Online Analytical Processing) which are part of the Business Intelligence and complement the OLTP applications. They are specially designed for efficient reporting. The databases configured for applications of this kind use multi-dimensional data model. The multi-dimensional data model allows efficient simultaneous analysis of data in several dimensions, such as time, region, product group, and many others, and makes it possible to hierarchically increase or reduce the degree of detail of reported data (drill down / across), e.g. the levels of day, week, month; quarter, city, region, country; soft drinks, drinks, food products; etc.
Data warehouses can be embedded on databases configured for OLAP. In 1990 Bill Inmon used the term Data Warehouse to call the theme-oriented, integrated, sustainable and time-variant data set used in the process of decision-making.
Data warehouse very well solves the problem of reporting performance on large data sets. It stores a copy of transactional data, which is usually located on a separate database on a separate server. Thus, analysis using queries which read large amounts of data does not affect the performance of the transactional system and the queries itself perform significantly faster, at the time counted in seconds, minutes at most. Data stored in the warehouse is not only in a detailed form, but it is also stored redundantly as aggregates, which also significantly improves the efficiency of reporting.
Regular delivery of new data to data warehouse, that is, copying data from transactional systems, is generally carried out after the end of the working day, usually at night, which does not burdens the transaction systems. This frequency of updating the data is absolutely sufficient for the strategic or tactical reporting.
However, when there is some data in the warehouse that needs to be updated more frequently, which may be needed for operational reporting, in this case you can apply RDA process – Real Time Data Acquisition. In this process, the selected warehouse objects are fed by transactional data with a high frequency, e.g. every few minutes, which is close to real time. Another solution is to carry out a query through the warehouse objects known as virtual providers which refer not to data that is stored in a warehouse, but directly to data in the source system. Performance in this case, however, is lower and the method is, therefore, used only on smaller data sets and only by few users at the same time.
The data warehouse application also solves the problem of heterogeneity of data from different source systems, databases, files and the Internet. By copying the data to the warehouse, the data is available for the entire company in one place and for a long time. Moreover, in the warehouse a process of “cleaning” the data and its transformation takes place so that data can be uniformly used in the analysis regardless of its source.
Reporting on the SAP Data Warehouse enables to display reports in a very advanced graphical form, sending or placing them on the corporate portal in very diverse ways, taking into account access rights and responsibilities of individual employees. In addition, if the data reported falls in certain ‘red’ ranges, the system can automatically notify the relevant persons through the generation and transmission of warnings.
The data coming from different source systems is standardized and harmonized in the data warehouse in ETL process (Extract, Transform, and Load) and is a good material for use in data mining processes. Data mining applies a variety of mathematical algorithms to discover interesting patterns and dependencies which are hard to notice in a large amount of data. In this way, we can notice that, for instance, in a machine a defect of one type often coexist with the defect of another type, or that customers who buy our product A often buy product B at the same time. With this information we can better plan the servicing of our equipment or in a better way present our products to increase sales and customer satisfaction. Data mining methods also allow to predict certain future dependencies on the basis of previous “experience” included in the data. They can, for example, predict how the sale of our products in different groups of customers will look like, or what is the likelihood of timely payment by customers of a particular group.
Thus, business intelligence based on the data warehouse enables to carry put an effective analysis of the current situation in the enterprise and with the help of data mining it makes it possible to extract knowledge from the stored data. This allows to make better business decisions and more accurate prediction about the future. SAP is one of the main suppliers of data warehouse, as a module running both on SAP and other systems. In the SAP data warehouse the data mining process is already integrated, which allows to record information obtained in this process back to the warehouse or to CRM module.