Rhadoop - Integration of R and Hadoop using rhdfs

Hi Friends,

From last few days I was trying to integrate R and Hadoop. I found that there are couple of packages available which can be used for R and Hadoop integration for example Rhipe.Rhadoop BigR etc. Out of these packages , most easiest one that i found  is Rhadoop as while using Rhipe i faced lot of dependencies issue.Here is an small demonstration how to get started with rhdfs. If you want more information then you can try for rmr for writing map reduce job. Below are some packages available with Rhadoop

1.rmr
2.rhdfs
3.rhbase
4.plyrmr

Note : I have installed everything on Cents OS 6.x

First thing that you need is that R to be in place,Use below commands to install R.

sudo su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'

sudo yum install R

Once R is install , just type R from shell and you should be able to enter into R shell.




Next thing is to install dependencies for Rhadoop package.

Download rhdfs from following link  and place in some directory.

https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads

Now start R shell and install rJava for rhdfs

install.packages( c("rJava"))

Once rJava is installed set HADOOP_CMD.

Sys.setenv(HADOOP_CMD="/usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop")

Now install rhdfs

install.packages("rhdfs_1.0.8.tar.gz", repos=NULL, type="source")

One everything is completed you can simply try following command to see if integration is working

library(rhdfs)
hdfs.init()
hdfs.ls('/')

Above command should list all directories present in "/" location.



Cheeeerrrrssss.....!!!!! Have a fun.....


Comments

Popular posts from this blog

JDBC Hive Connection fails : Unable to read HiveServer2 uri from ZooKeeper

Access Kubernetes ConfigMap in Spring Boot Application

Developing Custom Processor in Apache Nifi