Note: If you buy something from our links, we might earn a commission. See our disclosure statement.
Big Data is “Not a Replacement for existing Analytical systems” like Cubes, Data-warehouses etc. Big Data processes both remake, and complement existing analytic workflows by Simplifying production of structured information from emerging “ambient” data sources.
When you have non-traditional Datasources like Social media, IOT devices, Automated robotics etc. Big Data allows you to make sense out of these un-structured or semi-structured data into sensible analytical data. So here are the key points:
As you process more and more data, and you want interactive response ypically in most cases, you need more expensive hardware to support the infrastructure. Failures at the points of disk and network can be quite problematic and maintaining ACID (atomicity, consistency, isolation, durability) could be a challenge.
You can work around this problem with more expensive HW and systems like purchasing Database Appliances from Oracle or Microsoft (ESSBASE, PDW) but adoption would be small due to high costs.
In case of Big Data and Hadoop, We are using commodity hardware without the need for specialized and expensive network and disk. Not so much ACID, but we get BASE (basically available, soft state, eventually consistent)
Broadly put, the NoSQL is analogous to OLTP if you imagine Hadoop as a BI system. They are comprised of many components:
Implementations of Google’s BigTable – distributed storage system for managing structured data at very large sizes
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
A lot of its origin stems from Facebook. Cassandra was originally created at Facebook. However in case of Facebook messaging, Facebook decided to use HBase instead.
This is the favourite method for most SQL Professionals becuase it uses SQL like Queries to query data from Hadoop. It is a “Data warehouse” system for Hadoop.
With Hive, you can do the following:
You can connect from PowerQuery, PowerBI or PowerPivot for Excel etc using the Hive ODBC driver or native connectors.
CREATE TABLE indro_managed (bar int);
LOAD DATA INPATH ‘/user/larar/data.txt’
INTO TABLE indro_managed
CREATE EXTERNAL TABLE indro_external (bar int)
INTO TABLE indro_external
It is a scalable machine learning library that leverages the Hadoop infrastructure
Key Use Cases:
Data connector system for Hadoop and RDBMS
It is a Data-flow platform to transform and analyze HDFS data. It have the following benefits:
Read More about Pigs here: http://pig.apache.org/philosophy.html
It is a Workflow processing system. Users define a series of jobs written in multiple languages and link them to one another. For example: a particular query is only to be initiated after specified previous jobs on which it relies for data are completed.
It is a management system for monitoring a Hadoop system. With Ambari you can:
It is a Centralized service for mMaintaining configuration information and naming.
Centralized Metadata Management for Shared schema and data type and Table storage.
I hope this introductory post gave you a good understanding of what Big Data is all about. If this was helpful, do not forget to give your feedback in the comments section. Cheers!
Comments are closed.