What is NoSQL
NoSQL stands for Not Only SQL and
provides mechanism for storage and retrieval of data that is modeled in means
other than the tabular relations used in relational databases.
Features of
NoSQL databases
·
Non-Relational
·
Distributed
·
Open-Source
·
Horizontally Scalable
For What NoSQL is not suited for
NoSQL is not a replacement of SQL
·
Generally NoSQL is not fit for applications which need strong
consistency
·
Correctness of data is more important than availability of data
·
Transactional context is important than analytical processing
·
Data is structured and maintained through object relational
hierarchy
·
Need to support Legacy database
Brewer’s CAP
Theorem
Any distributed system can
support only two of the following characteristics at a time. No
sharing/distributed system can guarantee all the below three characteristics.
·
Consistency
o All nodes in the cluster will
have the same data at the same time
o Client perceives that a set of
operations has occurred all at once
o Similar to Atomic in ACID
transaction properties of RDBMS.
·
Availability
o Node failures do not prevent
services from continuing to operate
o Despite of some node(s) failures
data will be available to clients.
·
Partition tolerance
o Operations will complete, even if
individual components are unavailable
o System continues to operate
despite arbitrary message loss
Traditional RDBMS systems prefer
C over A and P. but web applications prefer A.
BASE Theorem
·
Basically Available – Faults may occur but not for entire
cluster, so that basic availability of data is possible.
·
Soft State – Inconsistent copies of data item
·
Eventually Consistent – Copies become consistent after some time
if there are no more updates for that item.
Features of
BASE theorem
·
Availability
·
Weak Consistency
·
Simpler and Faster
·
Approximate answers OK
Types of NoSQL
NoSQL Databases can be
categorized as follows.
·
Key Value –MemcacheDB,BerkleyDB
o Hash table of Keys
o Two-column table consisting of a
key and a value associated with the key.
o The key acts as the index, and
the value can be referenced as a look up.
·
Document Oriented -MongoDB
o Stores documents made up of tags
o Schema-less
o Uses JSON Format to store the
document tags
o Indexed semi-structured documents
·
Column Oriented – Cassandra ,HBase
o Each Storage block contains data
from only one column (family).
o Allows key-value pairs to be
stored (and retrieved on key) in a massively parallel system
§ Data model: families of
attributes defined in a schema, new attributes can be added
§ Storing principle: big hashed
distributed tables
§ Properties: partitioning
(horizontally), high availability etc.
§ Completely transparent to
application
§ Enables compression over column
o Graph Based – Focus on relation not only
entities. Neo4j.
o Object Databases
o Grid and Cloud
o Multidimensional
o XML Databases
SQL vs NoSQL:
SQL
|
NoSQL
|
Predefined
Schema
|
No Predefined
Schema
|
Supports ACID
Properties
|
Doesn’t
support ACID properties but follows BASE
|
Standard
definition and interface language
|
Per-product
definition and interface language
|
Highly
Consistent
|
Weak
Consistency
|
Getting an
answer quickly is more important than getting a correct answer
|
HBase
Characteristics:
·
Column or Cell is a tuple containing a name, a value and a
timestamp.
·
A column must have a name, and the name can be a static label
(such as name or email) or it can be dynamically set when the column is created
by application.
·
It is not required for a column to have a value.
·
Column Family is similar to a table in that it is a container
for columns and rows. Each column family should be designed to contain a single
type of data.
·
Column families should define metadata about the columns, but
the actual columns that make up a row are determined by the client application.
·
Each row can have a different set of columns.
HBase
Advantages:
·
Tunable consistency.
·
Writes are faster than reads.
·
No Single point of failure.
·
Incremental scalability.
·
Uses consistent hashing (logical partitioning) when clustered.
Mongo DB
Characteristics:
·
Document-oriented database
·
JSON-style documents: Lists, Maps, primitives
·
Documents organized into collections (table)
·
Full or partial document updates
·
Transactional update in place on one document
·
Atomic Modifiers
·
Rich query language for dynamic queries
·
Index support { secondary and compound }
·
GridFS for efficiently storing large files
·
Map/Reduce
Features of
Neo4J
A high level overview of Neo4J is
mentioned below.
·
A Graph database (mathematical modeling) for supporting
relational information across multiple entities. Developed by Neo technologies.
·
It is built on the concept of nodes, relationship, parameters
(key/value pair), and labels
·
A proprietary query named “Cypher” for performing CRUD
operations
·
A very high-performing NoSQL database for storing and retrieving
connected data
·
Graph is a way to maintain multi-dimensional relation among
entities
·
Highly applicable in social networking applications like social
graphs, recommendation etc.
·
The Neo4J site provides a rich REPL (Read-eval- print loop) web
interface for running queries as well as performing administrative works.
·
It is ACID compliant as well as provides high-availability and
master-slave replication across multiple nodes
·
Provides easy client interface through REST and Gremlin
Provides fast-look up
through Lucene