Pipe in Spark
Spark is distributed parallel processing framework developed using Scala. Spark also support Java and Python api for writing spark jobs.But there might be a case where  you might want to use any other language for RDD data processing ex. Shell Scrip, R etc because every functionality can not be provided by supported three language.
Spark provides a pipe method on RDDs. Spark’s pipe lets us write parts of jobs using any language we want as long as it can read and write to Unix standard streams.
It allows us to perform transformation in RDD and write result to standard output.
Today we will perform word count by parallelizing collection but in between we will use shell script for spliting sentence.
Sample Code :
If you look at 3rd line of code where pipe method invokes shell script and then this shell script will split you sentence and will return output to standard stream which will be further read by map function and performs subsequent operation
Sample Shell Script :
Output :
Happy Sparking.....!!!!
Spark provides a pipe method on RDDs. Spark’s pipe lets us write parts of jobs using any language we want as long as it can read and write to Unix standard streams.
It allows us to perform transformation in RDD and write result to standard output.
Today we will perform word count by parallelizing collection but in between we will use shell script for spliting sentence.
Sample Code :
val scriptLocation= "/root/user/shashi/spark/code/pipe/splittingScript.sh"
val input = sc.parallelize(List("this is file","file is this","this is hadoop"))
val output = input.pipe(scriptLocation).map(word => (word,1)).reduceByKey(_+_)
output.collect
If you look at 3rd line of code where pipe method invokes shell script and then this shell script will split you sentence and will return output to standard stream which will be further read by map function and performs subsequent operation
Sample Shell Script :
 #!/bin/sh  
 while read input;  
 do  
      for word in $input  
        do  
          echo $word  
        done  
 done  
Output :
Happy Sparking.....!!!!

 
 
 
Comments
Post a Comment