Enterprise TechWhat is Big Data April 9, 20163868 views0Share By IG ShareWhat is Big DataNote: If you buy something from our links, we might earn a commission. See our disclosure statement.Big Data is “Not a Replacement for existing Analytical systems” like Cubes, Data-warehouses etc. Big Data processes both remake, and complement existing analytic workflows by Simplifying production of structured information from emerging “ambient” data sources.When you have non-traditional Datasources like Social media, IOT devices, Automated robotics etc. Big Data allows you to make sense out of these un-structured or semi-structured data into sensible analytical data. So here are the key points:Enabling rapid sense-making over un-enriched and un-modeled dataEnabling analytics at scale over ambient dataEnabling creation of ambient data driven modelsExisting systems enable sense-making over modeled dataThere is tremendous potential value in making sense of ambient data Comparison Chart between an RDBMS System and a Big Data based MapReduceAs you process more and more data, and you want interactive response ypically in most cases, you need more expensive hardware to support the infrastructure. Failures at the points of disk and network can be quite problematic and maintaining ACID (atomicity, consistency, isolation, durability) could be a challenge.You can work around this problem with more expensive HW and systems like purchasing Database Appliances from Oracle or Microsoft (ESSBASE, PDW) but adoption would be small due to high costs.In case of Big Data and Hadoop, We are using commodity hardware without the need for specialized and expensive network and disk. Not so much ACID, but we get BASE (basically available, soft state, eventually consistent)Map Reduce (Split, Shuffle)The Hadoop Ecosystem The Hadoop Ecosystem The Hadoop EcosystemWhat is NoSQL ?Broadly put, the NoSQL is analogous to OLTP if you imagine Hadoop as a BI system. They are comprised of many components:HBaseCassandraMongoDBCouchBaseMemcacheD and more.Implementations of Google’s BigTable – distributed storage system for managing structured data at very large sizesA Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.What is HBASEEfficient at Random Reads/WritesDistributed, large scale data storeUtilizes Hadoop for persistenceBoth HBase and Hadoop are distributedCassandra implementation AT Netflix Source: http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandraWhere did Cassandra originated from?A lot of its origin stems from Facebook. Cassandra was originally created at Facebook. However in case of Facebook messaging, Facebook decided to use HBase instead. Source: http://www.slideshare.net/danirayan/streaming-map-reduceWhat is HIVE and HIVE QueriesThis is the favourite method for most SQL Professionals becuase it uses SQL like Queries to query data from Hadoop. It is a “Data warehouse” system for Hadoop.With Hive, you can do the following:Analysis of large datasets stored in HDFSSQL–Like InterfaceNo Java programming needed.Ad-hoc queries via HiveQL (translate into MapReduce)You can connect from PowerQuery, PowerBI or PowerPivot for Excel etc using the Hive ODBC driver or native connectors.Example HIVE Query:CREATE TABLE indro_managed (bar int);LOAD DATA INPATH ‘/user/larar/data.txt’INTO TABLE indro_managedCREATE EXTERNAL TABLE indro_external (bar int) LOCATION ‘/user/larar/indro_external’;LOAD DATA INPATH ‘/user/larar/data.txt’INTO TABLE indro_externalComparison Table for RDBMS and HiveWhat is Mahout?It is a scalable machine learning library that leverages the Hadoop infrastructureKey Use Cases:Recommendation mining: Examine user behavior, build recommendation modelClustering: Grouping data into related topicsClassification: Learn from classified documents to assign categories to unlabeled dataWhat is R Programming?Statistical computing and graphing programming languageRHIPE: R and Hadoop IntegrationOpen source GNU ProjectWhat us Scoop?Data connector system for Hadoop and RDBMSImporting RDBMS data to files (delimited or sequence) in HDFS, or tables in HiveImporting RDBMS query results to files (delimited or sequence) in HDFS, or tables in HiveExporting files and Hive tables to RDBMS tablesExecutes MapReduce jobs to transfer data in parallel with fault toleranceWhat is Pig?It is a Data-flow platform to transform and analyze HDFS data. It have the following benefits:Scripting – No Java Programming Needed!Focus on semantics, not on implementationExtensible through user defined functions and methodsPig can operate on data whether it has metadata or not.Pig is not tied to one particular parallel framework.Pig is designed to be easily controlled and modified by its users.Pig processes data quickly.Read More about Pigs here: http://pig.apache.org/philosophy.htmlWorkflow, Management & Monitoring with Oozie, Ambari, & ZooKeeperWhat is Oozie?It is a Workflow processing system. Users define a series of jobs written in multiple languages and link them to one another. For example: a particular query is only to be initiated after specified previous jobs on which it relies for data are completed.What is Ambari?It is a management system for monitoring a Hadoop system. With Ambari you can:Install: Wizard for installing Hadoop services across any number of nodesManage: Central management for starting, stopping, and reconfiguring Hadoop services across the entire clusterMonitor: Dashboard for monitoring health and status of the Hadoop cluster. Sends email alerts when your attention is needed (e.g., a node goes down, remaining disk space is low, etc)What is Zookeeper?It is a Centralized service for mMaintaining configuration information and naming.Providing distributed synchronizationProviding group servicesHigh throughput, low latency, highly available, strictly ordered accessWhat is HCatalog ?Centralized Metadata Management for Shared schema and data type and Table storage.Notifications via Java Message Service (JMS)Works across Pig, Map Reduce, and HiveI hope this introductory post gave you a good understanding of what Big Data is all about. If this was helpful, do not forget to give your feedback in the comments section. Cheers! Affiliate Disclosure: Faceofit.com is a participant in the Amazon Services LLC Associates Program. As an Amazon Associate we earn from qualifying purchases. Share What's your reaction? Excited 0 Happy 0 In Love 0 Not Sure 0 Silly 0
Enterprise TechList of the Best Threadripper Pro Workstations – W-3100 Xeon W-3200 & W-1200 Alternatives By IGMay 25, 2021
Enterprise TechList of the Best HPTX Cases – Huge PC Cases for Large BuildsThe computer cases have been one of the essential aspects when it comes to enjoying ...
AzureBest Industrial IoT Routers & Gateways For AWS and Azure IoT ServicesIn this article, we are going to list out the Best Industrial IoT Routers for professional ...
Enterprise TechWhat Is A DataCenter Power Distribution Unit (pdu) Server Rack CabinetsNo matter whether you call it a server closet or server room or a cabinet, ...
Storage SystemsWD SE vs WD Red Pro Specifications Comparison – Enterprise Class NASThe HDDs or the Hard Disk drives are slowly moving into oblivion with the advancements ...