Published October 31, 2016 by

Hadoop PIG Notes

Basics

  • Pig is a scripting platform for processing and analysing large data sets.
  • very usefulfor people who did not have java knowledge
  • used for high level data flow and processing the data available on HDFS.
  • PIG is named pig because like the animal, it can consume and process any type of data, and has lots of usage in data cleansing.
  • Internally, whatever you write in Pig, it internally converts to Map reduce(MR) jobs.
  • Pig is client side installation, it need not sit on hadoop cluster.
  • Pig script will execute a set of commands, which will be converted to Map Reduce(MR) jobs and submitted to hadoop running locally or remotely.
  • A hadoop cluster will not care whether the job was submitted from pig or from some other environment.
  • map reduce programs get executed only when the DUMP or STORE command is called(more on this later).
Read More
    email this
Published October 31, 2016 by

Hadoop Notes

Technologies in Hadoop Ecosystem

  • Hadoop => framework for distributed processing of large datasets across clusters of computers, it can scale up to thousands of machines, where each machine offers computation and data storage.
  • HBase => a scalable, distributed database that supports structured data storage for large tables.
  • Hive => a data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout => A scalable machine learning and data mining library.
  • Pig => data flow language and execution framework for parallel computation.
  • Spark => A fast engine for computing hadoop data.
  • Storm => a scalable and reliable engine for processing data as it flows into the system
  • Zookeeper => high performance coordination service for distributed applications.
Read More
    email this
Published October 08, 2016 by

Top Java Frequently Asked Questions- 2

51. Are true and false keywords? 
The values true and false are not keywords. 

52. What is a void return type? 
A void return type indicates that a method does not return a value after its execution. 

53. What is the difference between the File and RandomAccessFile classes? 
The File class encapsulates the files and directories of the local file system. The RandomAccessFile class provides the methods needed to directly access data contained in any part of a file. 

Read More
    email this
Published October 08, 2016 by

Top Java Frequently Asked Questions- 1

1. Can you write a Java class that could be used both as an applet as well as an application? 
Yes. Just, add a main() method to the applet. 

2. Explain the usage of Java packages. 
This is a way to organize files when a project consists of multiple modules. It also helps resolve naming conflicts when different packages have classes with the same names. Packages access level also allows you to protect data from being used by the non-authorized classes. 

Read More
    email this
Published October 08, 2016 by

Skills Required for Software Tester

Important Activities in IT (Information Technology) Industry: 

• Project Management
• Business Analysis 
• Software Development (Front-end and Back-end) 
• Software Testing (Manual Testing and Automated Testing)  
• Technical Support (Network Administration/System Administration) 
• DBA (Database Administration)
 • Software Maintenance

Read More
    email this
Published October 08, 2016 by

Banking Software Projects Info for Software Professionals

Banking is one the important area in BFSI (Banking, Financial Services and Insurance) Domain.
In Baking we have several business operations that require different types of Software Applications,

Important Banking Software Applications are,

1) Core Banking System 
2) ATM Banking
3) Internet Banking System (Online Banking)
4) Mobile Banking System
5) Forex Management
6) Treasury Management System (Treasury Management for Banks)
7) Asset Liability Management System
8)  Financial Management System

Read More
    email this
Published October 08, 2016 by

Continuous integration with Jenkins

Using the Jenkins build server
Continuous integration is a process in which all development work is integrated as early as possible. The resulting artifacts are automatically created and tested. This process should identify errors as very early in the process.
Jenkins is one open source tool to perform continuous integration and build automation. The basic functionality of Jenkins is to execute a predefined list of steps. The trigger for this execution can be time or event based. For example, every 20 minutes or after a new commit in a Git repository.
The list of steps can, for example, include:
  • perform a software build with Apache Maven or Gradle
  • Run a shell script
  • Archive the build result
  • Afterwards start the integration tests
Read More
    email this