What is the data warehouse?
First, it should be known that data warehouse is not a program or product. The data warehouse is an architecture that is a medium. The data warehouse is a repository for collectors and historical data in an understandable and easily accessible structure after receiving, cleaning and replacing data from different operational systems, call centers and similar sources.
In other words, data warehouse is a relational database designed to be used for querying and analysis rather than database movement. In general, it may include historical information from motion data, as well as information from other sources. With the workload of database movement, the analysis distinguishes the burden from each other, allowing the information gathered from different sources to be organized more easily.
As I mentioned above, the information to be transferred to the data warehouse passes through a number of operations before being transferred to the data warehouse. The data passes through the ETL process before entering the data warehouse. In this way, depending on how to use the given data, the desired format is inserted.
So what is this ETL?
First, let’s look at the opening of the ETL. ETL;
Extract: Receiving data from the source system,
Transform: The data have to go through certain transformations in order to be appropriate for our production. That is to say, cleaning and improving the quality of a certain kind,
Load: means that the data is loaded into the target system.
ETL in brief; the data is retrieved from the source system, changed accordingly, and loaded into the data warehouse.
Another data quality method is ELT. ELT (Extract Load Transform) is; the data is again taken from the source system, but this time the transform is performed after loading into the system.
- With these transactions,
- Data Cleaning,
- Data Conforming,
It’s called Data Quality.
So why is the quality of the data warehouse so important?
If the data in the data warehouse were very irregular and were in a situation where it was not possible to operate on the allele, we could get wrong results in our queries. For example; When we want to choose unmarried women over 18, the data in the gender block are female, female, May be entered in shapes. In this case, we will have lost a significant amount of future data when we take our inquiry as a lady. As a result, we will get wrong values.
If we think that companies invest according to these results and that the companies determine their directions accordingly, the consequences can be caused by the crisis.
Well, why did the data warehouse come up?
The concept of data warehouse emerges from the need to provide easy access to structured quality information that can be used in decision making. Hence, data warehouses are established for easy access to high quality databases that will be used for decision making and analysis purposes.
It is generally accepted that in the competitive environment of the business world knowledge will provide important advantages to the organization. Despite the fact that organizations have large amounts of data, unfortunately accessing and using these data becomes more difficult as the amount of data increases.
Data warehouses access data sources in different planes, clean up, filter and store data in an understandable and easily accessible structure. This data is then used in querying, reporting and data analysis.
What are these questioning and reporting techniques?
The most commonly used techniques for analysis are; Query and reporting, multidimensional analysis and data mining techniques.
What is data mining? What are the benefits?
There is a lot to talk about data mining, some of which are:
Data mining; Statistical data is used for analysis and discovery of information. Statistical data analysis identifies unusual patterns on the verse and applies statistical and mathematical modeling techniques to explain these patterns. These models are then used for prediction and estimation.
Data mining; is to discover the most beneficial way to use it.
Data mining; giving user’s new insights that cannot be found by questioning and reporting, or multidimensional analysis, by giving answers to questions that do not even come to mind. Unanswered questions are answered.
Data mining; is a newer analysis technique than other techniques. Because of the use of a method called discovery technique, reporting is very different from questioning and multidimensional analysis techniques. Instead of extracting the answer to a particular problem, the victim uses special algorithms that report the data by resolving the data.