Big Data is definitely one of the bigger trends pushing through BI and the IT world. You’ll hear this idea get tossed around a lot, especially in the BI and datawarehouse worlds. My interpretation is that it means scenarios and projects that generate a lot of relatively unstructured, freeform data. With the internet, social networks, and personal devices, the amount of data that companies and organizations are capturing is staggering and is only growing year over year.
One of the most popular technologies for managing this structurely is Apache Hadoop. This is an open source framework for networking back end servers together to act like a massive data source. The advantages are the scalability, reliability (if one unit goes down its processing is pushed over to other units), and cost. The software is open source, and its designed to work on cheap “whitebox” servers, so hardware costs are reduced. It hasn’t stopped Oracle from offering a big data appliance, based on Sun servers and Hadoop, to push itself as the big stack vendor for this. While Hadoop is a leader in the market, Greenplum and NoSQL are gaining some market share. Microsoft is also readying technology to compete. There are also some commercial versions of Hadoop getting released that offer more support to often overwhelmed IT backend administrators. The amount of increasing data and need to administer the systems is definitely straining staffs.
One issue is that the data is stored unstructured, so an additional layer needs to be implemented to allow for more traditional SQL queries. Of course, companies want to query and report on this information. Hive is another open source tool used to allow these queries to run against the database. Companies like Cloudera are offering connection abilities to Hive to allow that to be seen as a standard data source for BI vendors like Microstrategy. Another innovative player in this space is 1010Data, which utilizes a type of in-memory type technology to scale,store, and deliver information to data consumers.