Posts

Showing posts from October, 2016

Hello World with Apache Sentry

Image
Apache Sentry Sentry is Apache project for role based authorization in hadoop. Sentry works pretty well with Apache Hive.In this blog we will talk about creating a policy in Sentry using Beeline(HiveServer2) shell. Pre-requisite : I am having Cloudera VM with Sentry installed on it. Hive authorization is done by creating policies in sentry. Sentry policy can be created by Sentry Admins. We need to create sentry admin group and add that group into Sentry Admin list using cloudera manager(in sentry-site.xml). Lets create user sentryAdmin with group as sentryAdmin. Fire below command on linux. useradd sentryAdmin Now lets Add this group to sentry admin list. Go to Cloudera Manager - Sentry - Configuration . Select Sentry(Service-wide) from Scope and Main from cataegory. Add sentryAdmin in Admins Groups(sentry.service.admin.group) Restart Sentry service. Its time to create a policy for user. Now lets say that I have a database in Hive and I want to give read p

Developing Custom Processor in Apache Nifi

Image
Apache Nifi was developed to automate the flow of data between different systems. Apache NiFi is based on technology previously called “Niagara Files” that was in development and used at scale within the NSA for the last 8 years and was made available to the Apache Software Foundation through the NSA Technology Transfer Program. Nifi is based on FlowFiles which are heart of it. A FlowFile is a data record, which consists of a pointer to its content (payload) and attributes to support the content, that is associated with one or more provenance events. The attributes are key/value pairs that act as the metadata for the FlowFile, such as the FlowFile filename. The content is the actual data or the payload of the file. Provenance is a record of what’s happened to the FlowFile. Each one of these parts has its own repository (repo) for storage. Each flowfile is processed by FlowFile processor . Processors have access to attributes of a given FlowFile and its content stream. Processo