Reader small image

You're reading from  Hadoop Essentials

Product typeBook
Published inApr 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781784396688
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Shiva Achari
Shiva Achari
author image
Shiva Achari

Shiva Achari has over 8 years of extensive industry experience and is currently working as a Big Data Architect consultant with companies such as Oracle and Teradata. Over the years, he has architected, designed, and developed multiple innovative and high-performance large-scale solutions, such as distributed systems, data centers, big data management tools, SaaS cloud applications, Internet applications, and Data Analytics solutions. He is also experienced in designing big data and analytics applications, such as ingestion, cleansing, transformation, correlation of different sources, data mining, and user experience in Hadoop, Cassandra, Solr, Storm, R, and Tableau. He specializes in developing solutions for the big data domain and possesses sound hands-on experience on projects migrating to the Hadoop world, new developments, product consulting, and POC. He also has hands-on expertise in technologies such as Hadoop, Yarn, Sqoop, Hive, Pig, Flume, Solr, Lucene, Elasticsearch, Zookeeper, Storm, Redis, Cassandra, HBase, MongoDB, Talend, R, Mahout, Tableau, Java, and J2EE. He has been involved in reviewing Mastering Hadoop, Packt Publishing. Shiva has expertise in requirement analysis, estimations, technology evaluation, and system architecture along with domain experience in telecoms, Internet applications, document management, healthcare, and media. Currently, he is supporting presales activities such as writing technical proposals (RFP), providing technical consultation to customers, and managing deliveries of big data practice groups in Teradata.
Read more about Shiva Achari

Right arrow

Pig


Pig is a component which has the abstraction wrapper of Pig Latin language on top of MapReduce. Pig was developed by Yahoo! around 2006 and was contributed to Apache as an open source project. Pig Latin is a data flow language that is more comfortable for a procedural language developer or user. Pig can help manage the data in a flow which is ideal for the data flow process, ETL (Extract Transform Load), or the ELT (Extract Load Transform) process ad hoc data analysis.

Pig can be used in a much easier way for structured and semi-structured data analysis. Pig was developed based on a philosophy, which is that Pigs can eat anything, live anywhere, can be easily controlled and modified by the user, and it is important to process data quickly.

Pig data types

Pig has a collection of primitive data types, as well as complex data types. Inputs and outputs to Pig's relational operators are specified using these data types:

  • Primitive: int, long, float, double, chararray, and bytearray

  • Map: Map is...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hadoop Essentials
Published in: Apr 2015Publisher: PacktISBN-13: 9781784396688

Author (1)

author image
Shiva Achari

Shiva Achari has over 8 years of extensive industry experience and is currently working as a Big Data Architect consultant with companies such as Oracle and Teradata. Over the years, he has architected, designed, and developed multiple innovative and high-performance large-scale solutions, such as distributed systems, data centers, big data management tools, SaaS cloud applications, Internet applications, and Data Analytics solutions. He is also experienced in designing big data and analytics applications, such as ingestion, cleansing, transformation, correlation of different sources, data mining, and user experience in Hadoop, Cassandra, Solr, Storm, R, and Tableau. He specializes in developing solutions for the big data domain and possesses sound hands-on experience on projects migrating to the Hadoop world, new developments, product consulting, and POC. He also has hands-on expertise in technologies such as Hadoop, Yarn, Sqoop, Hive, Pig, Flume, Solr, Lucene, Elasticsearch, Zookeeper, Storm, Redis, Cassandra, HBase, MongoDB, Talend, R, Mahout, Tableau, Java, and J2EE. He has been involved in reviewing Mastering Hadoop, Packt Publishing. Shiva has expertise in requirement analysis, estimations, technology evaluation, and system architecture along with domain experience in telecoms, Internet applications, document management, healthcare, and media. Currently, he is supporting presales activities such as writing technical proposals (RFP), providing technical consultation to customers, and managing deliveries of big data practice groups in Teradata.
Read more about Shiva Achari