Application of the hottest ESB in data warehouse c

  • Detail

Application of ESB in data warehouse construction

1. Requirements for enterprise data warehouse system construction

at present, most enterprises have established a large number of their own business processing systems and enterprise office automation systems according to their business characteristics and office needs, and have accumulated a large amount of business data. These business information systems have played a positive role in improving the work efficiency of enterprises and reducing repetitive work, and have made great contributions to the development of enterprises. However, the data of these systems continues to expand with the development of time and business. At the same time, the data is distributed on different system platforms and has a variety of storage forms. With the intensification of market competition, information plays an increasingly important role in the survival and development of enterprises. Managers often hope to understand the development trend of business through the analysis of a large amount of data in the organization, while the traditional database only retains the current business processing information, lacking the preservation of decision-making sub experimental data. A large amount of historical information required for analysis can be achieved through buttons

therefore, a business intelligence system based on data warehouse is established to provide evidence-based factual support for the development decision of enterprises. Data warehouse refers to a subject oriented, consistent, different time, stable data set, which is used to support the decision support process in business management. Data warehouse system is a computer information processing system that collects, manages, processes and analyzes the external data of sales, inventory, production, procurement and other business financial data, market conditions, competitors and other external data involved in the production and operation process, and then gives the comprehensive analysis results

2. ETL Technology

a key core technology for building a data warehouse is data integration and migration. Now, it is generally implemented with ETL (extraction, transformation, loading) integrated migration tools. As the core and soul of data warehouse construction, ETL can integrate and improve the value of data according to unified rules. It is responsible for the process of data transformation from data source to target data warehouse, and it is an important step in the implementation of data warehouse

etl refers to the process of extracting, transforming and loading data in the process of data migration. The main purpose of ETL process is to transform the data oriented to daily business operations into decision support data oriented to data warehouse storage at the least cost. The traditional method is to manually write SQL statements and corresponding programs to achieve data extraction and conversion. This method requires high professional level of technicians and sufficient understanding of business. After a period of time, the number of SQL statements will increase sharply, and the system will gradually become difficult to maintain and reuse. Therefore, using universal and mature ETL tools to realize the concentration of data in the business system can improve the simplicity of reuse and maintenance, reduce the difficulty of designing extraction and conversion process, and make technicians focus on business rather than implementation details

The essence of ETL tool is data converter, which provides a method to convert data from source to target system. This function is traditionally completed by programmers. Different data extraction and loading programs need to be written for each data source, which is extremely inefficient. ETL tool provides a general solution. It generally generates program code for data conversion and loading in a graphical way, that is, it intuitively and efficiently generates a special data converter, thus reducing the work by 70% to 80%

etl process can be divided into three steps: first, extract the required data from the data source (at the end, you can also merge the data of the same experimental scheme conducted at different times or batches into the business system of the open layer); Then it is converted to the data format of the target data storage; Finally, the converted data will be loaded into the data warehouse. In order to solve the data quality problem of "dirty data, you are welcome to call our company at any time", the data cleaning function will be added after the data conversion step. ETL includes the following three components:

1) data extraction: refers to the process of extracting data from different networks, different operating platforms, different databases and data formats, and different applications; Data extraction includes complete extraction and incremental extraction. A large part of the data in the data warehouse is used to reflect the historical situation. The extraction function is not only a simple database oriented process, but also a process of obtaining incremental data

2) data conversion: refers to data conversion (data consolidation, summary, filtering, conversion, etc.), data reformatting and calculation, key data reconstruction, data summary, data positioning, etc

3) data loading: across networks and operating system platforms, load the data into the target database according to the table structure defined by the physical data model

3. Application collaboration syn__ Chroesb implements etl

3.1 syn__ Overview of ETL functions of chroesb

collaborative syn__ Chroesb is an enterprise service bus (ESB) based on SOA architecture. It is a standard based, message oriented, highly distributed system integration platform with intelligent routing. Based on JBI specification, it implements data integration services, including ETL related components, and has the functions of ordinary ETL tools. It is also an open platform. Users can write their own components and insert them into syn__ On the chroesb bus, specific functions are implemented

Syn__ The data integration services of chroesb mainly include historical data migration, data synchronization, data consolidation, data warehouse and other services

· historical data migration: realize the reuse of historical legacy data and migrate historical data to a new target database

· data synchronization: realize the uploading and downloading of data in the distributed database to ensure the consistency of data in the distributed database

· data consolidation: realize the consolidation of personalized data in databases of different applications into data in databases of unified structure, including cleaning, conversion and other operations

· data warehouse: centralize scattered data into a unified data warehouse and establish a unified data model for storage

Syn__ The ETL processing of chroesb mainly includes:

· realize batch extraction, incremental extraction and timing extraction of data sources, support mainstream databases, unstructured data, flat files and other data sources, and realize the unification of XML format of data from various data sources on the technical level after data extraction

· realize data transmission for distributed deployment system, provide reliable transmission mechanism, and support data compression, encryption and other processing; Realize the processing of all things from data to flexible computer chips that may be embedded in shirt fabrics, mainly including data cleaning and conversion processing

· realize efficient process parallel scheduling, and can load data in batches, which is suitable for the processing of massive data

as shown in the figure below

3.2 based on syn_ Chroesb's data integration framework

is based on syn_ The data exchange system of chroesb platform can integrate heterogeneous data (databases) from bottom to top and different nodes (counties, cities, provinces) and summarize them to the central data center; It can solve the data inconsistency and synchronization needs of heterogeneous databases within the organization; It can realize data exchange and sharing between horizontal systems. Users can use the default adapter components provided by the platform to connect the original application system or database system to the entire data exchange platform to realize data exchange and information sharing between the entire information system

Syn_ The adaptation interface layer of chroesb is a dynamic plug-in that supports various data sources. It can shield the underlying technical differences of various data sources, provide a unified access interface for data sources, and enhance the ability of applications and technologies to work together. Syn_ Chroesb provides the following adapter components by default. Of course, users can develop their own adapter components according to their needs. The following figure shows the practical application of the adaptive interface layer

RDBMS: Oracle, Sybase, DB2, MSSQL

database adapters encapsulate various operations on databases, providing unified access interfaces and consistent XML expression. It includes multiple adapters that provide different access methods, and provides the functions of reading, writing, querying, and monitoring the database

file adapter

file adapter: file adapter provides a general file access method. By specifying relevant parameters, you can flexibly read and write all kinds of files. It can access network file system and local file system


the HTTP adapter passes messages to the servlet running on a specified URL, or receives HTTP requests from the client through the HTTP protocol


soap adapter encapsulates the access to WebService, which is used to access the web service corresponding to a specified URL


jms adapter encapsulates the implementation of various JMS, and accesses the message middleware of other JMS by providing a unified interface

if service

if service filters data based on the specified XPath selection criteria. This service receives XML data at the input event port, selects XPath, and writes the data to the correct output event port. This service also has the function of data cleaning, which can clear the data that does not conform to the specifications and empty values

join service

join service completes the consolidation of data in the process of data integration. This service describes a synchronous node for waiting for input information received together in multiple input channels. Once the data is received in multiple input channels, the service generates a new XML file. This XML contains multiple input XML files. The output event port sends the XML to the next service in the workflow

cbr service

content based routing (CBR) service distributes data between multiple dynamic ports according to a series of configured XPath selection criteria. Specifically, the CBR service receives XML data at the input event port, then performs XPath discrimination on the received data, and sends it to other services from the corresponding output port

4. Syn based_ Advantages of data integration of chroesb platform

syn based_ The data integration mode of the chroesb middleware platform combines data integration with middleware, mainly because this combination has the following advantages:

1. Scalability: the data integration system must be based on flexibility and scalability, so that the expanding enterprise data applications can be rooted in an environment that is easy to grow, so that the data generated in different periods can be integrated into an organic whole, Middleware just provides such a basic environment

2. interoperability: middleware separates the application from the underlying environment through a simple API or general interface, and realizes the interoperability between heterogeneous hardware platforms and heterogeneous operating system platforms. This also solves the problem of system heterogeneity in data integration. It can be said that the interoperability of middleware provides an economic and effective means for enterprise data integration

3. adaptability: the middleware enables the heterogeneous data integration system to adapt to the changing business needs, and can minimize the impact on the whole system when increasing or decreasing the environmental changes of clients, applications, server nodes, etc

4. easy development: provided by middleware

Copyright © 2011 JIN SHI