Posts

Showing posts from August, 2014

Hadoop - Sequence and Map File

Image
Hadoop Sequence Files: -It is a flat file with binary key/value pairs. -There are three different sequence file formats: 1.Uncompressed key/value records. 2.Record compressed key value records. - here values are compressed. 3.Block compressed key value records. -here both key and values are blocked separately and compressed. Small File Problem: All small files can be treated as values for each key and can be stored into a single sequence file.This will reduce overhead of Namenode for storing metadat information about each file. MapFile: The map file is actually a directory.  Within the same, there is an "index" file, and a "data" file. The data file is a sequence file and has keys and associated values. The index file is smaller, has key value pairs with the key being the actual key of the data, and the value, the byte offset.