Streaming data into Hadoop using Apache Flume

Flume:Flume a hadoop echo system s/w used for streaming the logs file from applications int o HDFS.

In this post let's discuuss about following topics.
  • Overview on Flume
  • Streaming log files data into HDFS
  • Streaming Twitter App logs into HDFS

Flume 3 components

1) source --> source for generating data
2) channel--> route/pipe 
3) Sink---> final destination

Flume installed location


[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave   (optional)


STEP 1: download flume-1.0.SNAPSHOT.jar and copy it into /usr/lib/flume-ng/lib


STEP 2: Create/update file with the following entries

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

STEP 3: log file location

STEP 4: Create/update test1.conf file in   /usr/lib/flume-ng/conf  directory with details

STEP 5: execute test1.conf

[cloudera@quickstart conf]$ flume-ng agent -n agent -c conf -f /usr/lib/flume-ng/conf/test1.conf

Twitter use case: Streaming twitter app log data into HDFS

Steps to be performed in twitter app:

create new app in the twitter--> specify application details-->createa app

get the following values

Consumer Key (API Key) hMYQRNcTfeoMBHetW6j9IkZvy
Consumer Secret (API Secret) GDTjndxqpg70MH0Cgr1xqwxMvho5gsihCbovT8fCFRCjG7ZWxJ
Access Token 1290729055-4d0tVDRyzvIOIeB6q8Li3S7rXBIgDh6vMm5GrmX
Access Token Secret 7iLmyvTnwjZU82CJBxvHUHo2KtXnKmOFO5RjSThxtCSGx

Step 1 & Step 2 are same as above Example-1.
Step 3 not required since we will get the log data from twitter app directly.
Step 4: Create/update twitter.conf file in   /usr/lib/flume-ng/conf  directory with details
Step 5: execute twitter.conf

[cloudera@quickstart conf]$ flume-ng agent -n TwitterAgent -c conf -f /usr/lib/flume-ng/conf/twitter.conf