Posts

Showing posts from May, 2015

Custom Source in Flume

Image
Flume provides a way where you can write your own source.As we know that there are default source type available in flume like exec, spoolDir, Tiwtter. Here I have a tried small demonstration for custom flume source.In this example I have written MySource java class which will read single line from input and concatenate them as output and it will pass it to channel. Example: Sample Input File : 20 50 50 04 17 59 18 43 28 58 27 81 Sample Output File :   20 2050 205050 20505004 2050500417 205050041759 20505004175918 2050500417591843 205050041759184328 20505004175918432858 First line is concatenated with other and process continues in this way. Here is my Java Code. MySource.Java import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.nio.charset.Charset; import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.EventDeliveryExcep

Rhadoop - Integration of R and Hadoop using rhdfs

Image
Hi Friends, From last few days I was trying to integrate R and Hadoop. I found that there are couple of packages available which can be used for R and Hadoop integration for example Rhipe.Rhadoop BigR etc. Out of these packages , most easiest one that i found  is Rhadoop as while using Rhipe i faced lot of dependencies issue.Here is an small demonstration how to get started with rhdfs. If you want more information then you can try for rmr for writing map reduce job. Below are some packages available with Rhadoop 1.rmr 2.rhdfs 3.rhbase 4.plyrmr Note : I have installed everything on Cents OS 6.x First thing that you need is that R to be in place,Use below commands to install R. sudo su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm' sudo yum install R Once R is install , just type R from shell and you should be able to enter into R shell. Next thing is to install dependencies for Rhadoop package. Download rhdf