NoSQL


What is NoSQL

NoSQL stands for Not Only SQL and provides mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Features of NoSQL databases
·        Non-Relational
·        Distributed
·        Open-Source
·        Horizontally Scalable
·        Relaxing from ACID properties of RDBMS but adheres to BASE theorem.

For What NoSQL is not suited for

NoSQL is not a replacement of SQL
·        Generally NoSQL is not fit for applications which need strong consistency
·        Correctness of data is more important than availability of data
·        Transactional context is important than analytical processing
·        Data is structured and maintained through object relational hierarchy
·        Need to support Legacy database

Brewer’s CAP Theorem

Any distributed system can support only two of the following characteristics at a time. No sharing/distributed system can guarantee all the below three characteristics.
·        Consistency
o   All nodes in the cluster will have the same data at the same time
o   Client perceives that a set of operations has occurred all at once
o   Similar to Atomic in ACID transaction properties of RDBMS.
·        Availability
o   Node failures do not prevent services from continuing to operate
o   Despite of some node(s) failures data will be available to clients.
·        Partition tolerance
o   Operations will complete, even if individual components are unavailable
o   System continues to operate despite arbitrary message loss
Traditional RDBMS systems prefer C over A and P. but web applications prefer A.

BASE Theorem

·        Basically Available – Faults may occur but not for entire cluster, so that basic availability of data is possible.
·        Soft State – Inconsistent copies of data item
·        Eventually Consistent – Copies become consistent after some time if there are no more updates for that item.

Features of BASE theorem

·        Availability
·        Weak Consistency
·        Simpler and Faster
·        Approximate answers OK

Types of NoSQL

NoSQL Databases can be categorized as follows.
·        Key Value –MemcacheDB,BerkleyDB
o   Hash table of Keys
o   Two-column table consisting of a key and a value associated with the key.
o   The key acts as the index, and the value can be referenced as a look up.
·        Document Oriented -MongoDB
o   Stores documents made up of tags
o   Schema-less
o   Uses JSON Format to store the document tags
o   Indexed semi-structured documents
·        Column Oriented – Cassandra ,HBase
o   Each Storage block contains data from only one column (family).
o   Allows key-value pairs to be stored (and retrieved on key) in a massively parallel system
§  Data model: families of attributes defined in a schema, new attributes can be added
§  Storing principle: big hashed distributed tables
§  Properties: partitioning (horizontally), high availability etc.
§  Completely transparent to application
§  Enables compression over column
o   Graph Based – Focus on relation not only entities. Neo4j.
o   Object Databases
o   Grid and Cloud
o   Multidimensional
o   XML Databases

SQL vs NoSQL:

SQL
NoSQL
Predefined Schema
No Predefined Schema
Supports ACID Properties
Doesn’t support ACID properties but follows BASE
Standard definition and interface language
Per-product definition and interface language
Highly Consistent
Weak Consistency

Getting an answer quickly is more important than getting a correct answer

HBase Characteristics:

·        Column or Cell is a tuple containing a name, a value and a timestamp.
·        A column must have a name, and the name can be a static label (such as name or email) or it can be dynamically set when the column is created by application.
·        It is not required for a column to have a value.
·        Column Family is similar to a table in that it is a container for columns and rows. Each column family should be designed to contain a single type of data.
·        Column families should define metadata about the columns, but the actual columns that make up a row are determined by the client application.
·        Each row can have a different set of columns.
HBase Advantages:
·        Tunable consistency.
·        Writes are faster than reads.
·        No Single point of failure.
·        Incremental scalability.
·        Uses consistent hashing (logical partitioning) when clustered.

Mongo DB Characteristics:

·        Document-oriented database
·        JSON-style documents: Lists, Maps, primitives
·        Documents organized into collections (table)
·        Full or partial document updates
·        Transactional update in place on one document
·        Atomic Modifiers
·        Rich query language for dynamic queries
·        Index support { secondary and compound }
·        GridFS for efficiently storing large files
·        Map/Reduce

Features of Neo4J

A high level overview of Neo4J is mentioned below.
·        A Graph database (mathematical modeling) for supporting relational information across multiple entities. Developed by Neo technologies.
·        It is built on the concept of nodes, relationship, parameters (key/value pair), and labels
·        A proprietary query named “Cypher” for performing CRUD operations
·        A very high-performing NoSQL database for storing and retrieving connected data
·        Graph is a way to maintain multi-dimensional relation among entities
·        Highly applicable in social networking applications like social graphs, recommendation etc.
·        The Neo4J site provides a rich REPL (Read-eval- print loop) web interface for running queries as well as performing administrative works.
·        It is ACID compliant as well as provides high-availability and master-slave replication across multiple nodes
·        Provides easy client interface through REST and Gremlin
Provides fast-look up through Lucene

Followers