Data mining is a fast growing segment of the business intelligence world. With the increase in big data in all different industries, harvesting the trends, patterns, and insights from these resources will be a big industry. There’s a whole industry of technology that is applied to this type of work. You may also hear it called things like data analytics, or business analytics. Certainly, analytics is part of the process, but statistics, datawarehousing, and modeling is a large part of the definition as well.
With the advance of processors, memory, and new data warehouse technologies and techniques in the last few years, the technical power needed for these kinds of advanced analysis is available for businesses. Previously, companies would need big mainframes and server hardware to conduct the kind of mining necessary with big databases.
The big BI vendors have varying degrees of products available in their software suites for this type of analysis. It’s very common for these to be “tacked on” to existing business intelligence tools. We think they have been somewhat overlooked, both in the technology and marketing ends. It can tend to feel a little bit geeky to people without deep experience. Done right, this can be a good addition to an organizations set of analytics.
SAS is probably the most well-known analytics company. Historically they have focused heavily on mining and analytics, since the 1970s. Their Enterprise Miner is known as a gold standard in the industry and is really their bread and butter.
SPSS is another big software package that is popular in the commercial space. IBM recently purchased them so it will interesting to see how that gets rolled into their Cognos software.
Microstrategy also has a Data Mining offering included with its Desktop Designer package.
R is probably the biggest open source software package available. It’s very robust but doesn’t have the nicer interface of SAS or SPSS. Still, it’s very popular with researchers or smaller implementations that don’t have the budget for a big commercial installation. Pentaho also has an open source solution, although it’s a little less refined that the commercial offerings.
Along side this type of technology is a need for operational intelligence on unstructured data. This commonly involves server logs, web activity, and other information. The biggest player in this space is Splunk.