Flume:Flume a hadoop echo system s/w used for streaming the logs file from applications int o HDFS.
In this post let's discuuss about following topics.
Flume 3 components
1) source --> source for generating data
2) channel--> route/pipe
3) Sink---> final destination
Flume installed location
/usr/lib/flume-ng
/usr/lib/flume-ng/conf
flume-env.sh
xxxx.conf
[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave (optional)
Example-1
STEP 1: download flume-1.0.SNAPSHOT.jar and copy it into /usr/lib/flume-ng/lib
Link: https://drive.google.com/file/d/0Bw0lnp1j6kxDMDY4N2xZcjJXcWM/view
STEP 2: Create/update flume-env.sh file with the following entries
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"
STEP 3: log file location
/home/cloudera/training/flumedata/sample.log
STEP 4: Create/update test1.conf file in /usr/lib/flume-ng/conf directory with details
STEP 5: execute test1.conf
[cloudera@quickstart conf]$ flume-ng agent -n agent -c conf -f /usr/lib/flume-ng/conf/test1.conf
Twitter use case: Streaming twitter app log data into HDFS
Steps to be performed in twitter app:
URL: https://apps.twitter.com/
create new app in the twitter--> specify application details-->createa app
get the following values
Consumer Key (API Key) hMYQRNcTfeoMBHetW6j9IkZvy
Consumer Secret (API Secret) GDTjndxqpg70MH0Cgr1xqwxMvho5gsihCbovT8fCFRCjG7ZWxJ
Access Token 1290729055-4d0tVDRyzvIOIeB6q8Li3S7rXBIgDh6vMm5GrmX
Access Token Secret 7iLmyvTnwjZU82CJBxvHUHo2KtXnKmOFO5RjSThxtCSGx
Step 1 & Step 2 are same as above Example-1.
Step 3 not required since we will get the log data from twitter app directly.
Step 4: Create/update twitter.conf file in /usr/lib/flume-ng/conf directory with details
Step 5: execute twitter.conf
[cloudera@quickstart conf]$ flume-ng agent -n TwitterAgent -c conf -f /usr/lib/flume-ng/conf/twitter.conf