Tuesday 24 July 2018

Execute Pig Script | Apache Pig Running Scripts

Execute Pig Script | Apache Pig Running Scripts


1. Execute Pig Script

In this article on how to execute pig script, we will see the whole concept of PigScripts Execution. Also, we will cover the basic comments in Pig Script, that will help while writing a script in a file. Moreover, we will see how to Execute Pig Script in a Batch mode as well as how to Execute a Pig Script from HDFS with proper steps and examples.
Execute Pig Script
Execute Pig Script
2. Introduction to Apache Pig Running Scripts
Basically, to place Pig Latin statements and Pig commands in a single file, we use Pig scripts. It is good practice to identify the file using the *.pig extension, even while not required.
Moreover, we can run Pig scripts from the command line and from the Grunt shell.
Also, to pass values to parameters using parameter substitution, Pig scripts allows us to do so.

3. Comments in Pig Script

We can include comments in Pig Script while writing a script in a file. like:
a. Multi-line comments
The multi-line comments will begin with ‘/*’, end them with ‘*/’.
  1. /* These are the multi-line comments
  2. In the pig script */
b. Single –line comments
The single-line comments will begin with ‘–‘.
  1. --we can write single line comments like this.

4. Executing Pig Script in Batch mode

Further, follow these steps, while we execute Pig script in batch mode.
Step 1
At very first, write all the required Pig Latin statements and commands in a single file. Then save it as a .pig file.
Step 2
Afterwards, execute the Apache Pig script. To execute Pig script from the shell (Linux), see:
  • Local mode
  1. $ pig -x local Sample_script.pig
  • MapReduce mode
  1. $ pig -x MapReduce Sample_script.pig
It is possible to execute it from the Grunt shell as well using the exec command.
  1. grunt> exec /sample_script.pig
5. Executing a Pig Script from HDFS
Also, we can execute Pig script that resides in the HDFS. Let’s assume there is a Pig script with the name Sample_script.pig in the HDFS directory named /pig_data/. To execute it, see.
  1. $ pig -x mapreduce hdfs://localhost:9000/pig_data/Sample_script.pig
  • Pig Script Example
Suppose we have a file Employee_details.txt in HDFS with the following content.
  1. Employee_details.txt
  2. 001,mehul,chourey,21,9848022337,Hyderabad
  3. 002,Ankur,Dutta,22,9848022338,Kolkata
  4. 003,Shubham,Sengar,22,9848022339,Delhi
  5. 004,Prerna,Tripathi,21,9848022330,Pune
  6. 005,Sagar,Joshi,23,9848022336,Bhuwaneshwar
  7. 006,Monika,sharma,23,9848022335,Chennai
  8. 007,pulkit,pawar,24,9848022334,trivendram
  9. 008,Roshan,Shaikh,24,9848022333,Chennai
Now, also we have a sample script with the name sample_script.pig, in the same HDFS directory. It contains statements performing operations and transformations on the Employee relation.
  1. Employee = LOAD 'hdfs://localhost:9000/pig_data/Employee_details.txt' USING PigStorage(',')
  2. as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
  3. Employee_order = ORDER Employee BY age DESC;
  4. Employee_limit = LIMIT Employee_order 4;
  5. Dump Employee_limit;
  • The script will load the data in the file named Employee_details.txt as a relation named Employee, in the first statement.
  • Moreover, the script will arrange the tuples of the relation in descending order, based on age, and store it as Employee_order, in the second statement.
  • The script will store the first 4 tuples of Employee_order as Employee_limit, in the third statement.
  • Ultimately,  last and the 4rth statement will dump the content of the relation Employee_limit.
Further, let’s execute the sample_script.pig.
  1. $./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
In this way, Pig gets executed and gives you the output like:
  1. (7,Pulkit,Pawar,24,9848022334,trivendram)
  2. (8,Roshan,Shaikh,24,9848022333,Chennai)
  3. (5,Sagar,Joshi,23,9848022336,Bhuwaneshwar)
  4. (6,Monika,Sharma,23,9848022335,Chennai)
  5. 2015-10-19 10:31:27,446 [main] INFO org.apache.pig.Main - Pig script completed in 12
  6. minutes, 32 seconds and 751 milliseconds (752751 ms)
This was all about how to Execute Pig Script. Hope you like our explanation.

6. Conclusion

As a result, we have seen the whole concept of Apache Pig Running Scripts, along with Executing of Pig Script in Batch mode and from HDFS. Also, we have seen its comments to understand well. Still, if any doubt occurs, feel free to ask in the comment section.

0 comments:

Post a Comment