SDET-QA Blog: Streaming data into Hadoop using Apache Flume

Flume:Flume a hadoop echo system s/w used for streaming the logs file from applications int o HDFS.

In this post let's discuuss about following topics.

Flume 3 components

1) source --> source for generating data

2) channel--> route/pipe

3) Sink---> final destination

Flume installed location

/usr/lib/flume-ng

/usr/lib/flume-ng/conf

flume-env.sh

xxxx.conf

[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave (optional)

Example-1

STEP 1: download flume-1.0.SNAPSHOT.jar and copy it into /usr/lib/flume-ng/lib

Link: https://drive.google.com/file/d/0Bw0lnp1j6kxDMDY4N2xZcjJXcWM/view

STEP 2: Create/update flume-env.sh file with the following entries

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"

STEP 3: log file location

/home/cloudera/training/flumedata/sample.log

STEP 4: Create/update test1.conf file in /usr/lib/flume-ng/conf directory with details

STEP 5: execute test1.conf

[cloudera@quickstart conf]$ flume-ng agent -n agent -c conf -f /usr/lib/flume-ng/conf/test1.conf

Twitter use case: Streaming twitter app log data into HDFS

Steps to be performed in twitter app:

URL: https://apps.twitter.com/

create new app in the twitter--> specify application details-->createa app

get the following values

Consumer Key (API Key) hMYQRNcTfeoMBHetW6j9IkZvy

Consumer Secret (API Secret) GDTjndxqpg70MH0Cgr1xqwxMvho5gsihCbovT8fCFRCjG7ZWxJ

Access Token 1290729055-4d0tVDRyzvIOIeB6q8Li3S7rXBIgDh6vMm5GrmX

Access Token Secret 7iLmyvTnwjZU82CJBxvHUHo2KtXnKmOFO5RjSThxtCSGx

Step 1 & Step 2 are same as above Example-1.

Step 3 not required since we will get the log data from twitter app directly.

Step 4: Create/update twitter.conf file in /usr/lib/flume-ng/conf directory with details

Step 5: execute twitter.conf

[cloudera@quickstart conf]$ flume-ng agent -n TwitterAgent -c conf -f /usr/lib/flume-ng/conf/twitter.conf

SDET-QA Blog