Pipe in Spark
Spark is distributed parallel processing framework developed using Scala. Spark also support Java and Python api for writing spark jobs.But there might be a case where you might want to use any other language for RDD data processing ex. Shell Scrip, R etc because every functionality can not be provided by supported three language.
Spark provides a pipe method on RDDs. Spark’s pipe lets us write parts of jobs using any language we want as long as it can read and write to Unix standard streams.
It allows us to perform transformation in RDD and write result to standard output.
Today we will perform word count by parallelizing collection but in between we will use shell script for spliting sentence.
Sample Code :
If you look at 3rd line of code where pipe method invokes shell script and then this shell script will split you sentence and will return output to standard stream which will be further read by map function and performs subsequent operation
Sample Shell Script :
Output :
Happy Sparking.....!!!!
Spark provides a pipe method on RDDs. Spark’s pipe lets us write parts of jobs using any language we want as long as it can read and write to Unix standard streams.
It allows us to perform transformation in RDD and write result to standard output.
Today we will perform word count by parallelizing collection but in between we will use shell script for spliting sentence.
Sample Code :
val scriptLocation= "/root/user/shashi/spark/code/pipe/splittingScript.sh"
val input = sc.parallelize(List("this is file","file is this","this is hadoop"))
val output = input.pipe(scriptLocation).map(word => (word,1)).reduceByKey(_+_)
output.collect
If you look at 3rd line of code where pipe method invokes shell script and then this shell script will split you sentence and will return output to standard stream which will be further read by map function and performs subsequent operation
Sample Shell Script :
#!/bin/sh
while read input;
do
for word in $input
do
echo $word
done
done
Output :
Happy Sparking.....!!!!
Comments
Post a Comment