Rhadoop - Integration of R and Hadoop using rhdfs
Hi Friends,
From last few days I was trying to integrate R and Hadoop. I found that there are couple of packages available which can be used for R and Hadoop integration for example Rhipe.Rhadoop BigR etc. Out of these packages , most easiest one that i found is Rhadoop as while using Rhipe i faced lot of dependencies issue.Here is an small demonstration how to get started with rhdfs. If you want more information then you can try for rmr for writing map reduce job. Below are some packages available with Rhadoop
1.rmr
2.rhdfs
3.rhbase
4.plyrmr
Note : I have installed everything on Cents OS 6.x
First thing that you need is that R to be in place,Use below commands to install R.
sudo su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'
sudo yum install R
Once R is install , just type R from shell and you should be able to enter into R shell.
Next thing is to install dependencies for Rhadoop package.
Download rhdfs from following link and place in some directory.
https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
Now start R shell and install rJava for rhdfs
install.packages( c("rJava"))
Cheeeerrrrssss.....!!!!! Have a fun.....
From last few days I was trying to integrate R and Hadoop. I found that there are couple of packages available which can be used for R and Hadoop integration for example Rhipe.Rhadoop BigR etc. Out of these packages , most easiest one that i found is Rhadoop as while using Rhipe i faced lot of dependencies issue.Here is an small demonstration how to get started with rhdfs. If you want more information then you can try for rmr for writing map reduce job. Below are some packages available with Rhadoop
1.rmr
2.rhdfs
3.rhbase
4.plyrmr
Note : I have installed everything on Cents OS 6.x
First thing that you need is that R to be in place,Use below commands to install R.
sudo su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'
sudo yum install R
Once R is install , just type R from shell and you should be able to enter into R shell.
Next thing is to install dependencies for Rhadoop package.
Download rhdfs from following link and place in some directory.
https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
Now start R shell and install rJava for rhdfs
install.packages( c("rJava"))
Once rJava is installed set HADOOP_CMD.
Sys.setenv(HADOOP_CMD="/usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop")
Now install rhdfs
install.packages("rhdfs_1.0.8.tar.gz", repos=NULL, type="source")
One everything is completed you can simply try following command to see if integration is working
library(rhdfs)
hdfs.init()
hdfs.ls('/')
Above command should list all directories present in "/" location.
Comments
Post a Comment