Posts

Showing posts from November, 2015

Joining Spark RDD's

Image
Hi Friends, Today I will be demonstrating, how you can perform joins on Spark RDD's. We are going to focus on three basic join operations. 1. Join (Inner) 2.Left Outer Join 3. Right Outer Join Lets take standard Employee+Department example and create a two RDDs;one holding employee data and another holding department data. Employee Table : Eid EName LName 101 Sam Flam 102 Scot Rut 103 Jass Tez val EmpRDD = sc.parallelize(Seq((101,"Sam","Flam"),(102,"Scot","Rut"),(103,"Jas","Tez"))) Array[(Int, String, String)] = Array((101,Sam,Flam), (102,Scot,Rut), (103,Jas,Tez)) // output Department Table : DeptId DepartmentName Eid D01 Computer 101 D02 Electronic 104 D03 Civil 102 val DeptRDD = sc.parallelize(Seq(("D01","Computer",101),("D02","Electronic",104),("D03","Civil",102))) Array[(String, S