ETL is an important process in a full scale BI implementation, although technically its more related to the data warehouse process. It should be considered one of the important tools of the overall BI strategy, especially for organizations with diverse or complex data structures and data integration needs. This article will give a general overview.
The letters stand for Extract, Transform and Load and are the three processes for moving data from one data source or database into another. Sometimes this will be referred to as data integration. Extract means taking data from source systems. Transform means changing the data to a new format. Load refers to the procedure for populating a target database with the transformed information.
Within the business intelligence world, it usually means taking data from separate sources into a data warehouse, where reporting and analysis can be centralized and a single version of the truth created. This is critical as often times data is stored in cryptic or in differing formats, and so standardization and conversion is needed for the information to make sense to business users. This process usually happens overnight to convert daily activities but can run during the day or even on an ad-hoc basis.
Over the years, with the increase in BI and database deployments the use of ETL has become more and more widespread and so this technology niche has grown. The large business intelligence companies have responded in many cases by acquiring or creating in-house tech for the job. In addition, several stand alone companies do nothing but focus on this process.
From the big stack players, most of them have added at least basic extract, transform, and load functions into its software. IBM acquired and integrated DataStage. Oracle has its Warehouse Builder. Microsoft has SSIS (SQL Server Integration Services) built into its SQL Server software. SAP has several, including SAP BO Data Services. Even SAS has its Enterprise Data Integration Server. Smaller vendors, for the most part, preach the use of best of breed tools but Qlikview does offer a form of it built in with its in-memory load technology. Microstrategy currently does not have this included, nor does Tableau.
Open source is also an option to consider, with Talend being very popular and it has been embraced by large consulting firms and implementers such as Accenture. Pentaho and Jaspersoft also have ETL functionality built into their suites.
Some of the bigger independent vendors include Ab Initio and Informatica, with its Powercenter package. Other smaller competitors include Aptivia.