Wednesday, August 10, 2016

Using NoSQL databases


Databases plays a vital role when it comes to managing data in applications. RDBMS (Relational Database Management Systems) are commonly use to store/manage data/transactions in application programming.
As per the design of RDBMS, there are some limitations when applying RDBMS to manage Big/dynamic/unstructured data.
  • RDBMS use tables, join operations, references/foreign keys to make connections among tables. It will be costly to handle complex operations that involve multiple tables.
  • It is hard to restructure a table. (eg: each entry/row in the table has similar set of fields). If the data structure changed, the table has to be changed
In contrast, there are applications that process large scale, dynamic data (eg: geospatial data, data used in social networks). Due to the limitations above, the RDBMS may not be the ideal choice. 

What is No-SQL?

No-SQL (Not only SQL) is a non-relational database management system, that has some significant differences than RDBMSs. No-SQL as the name suggest does not use a SQL as the querying language and uses javascript(commonly used) instead. JSON is frequently used when storing records. 

No-SQL databases some key features that make it more flexible than RDBMS,
  1. The database, tables, fields need not to be pre-defined when inserting records. If the data structure is not present database will create it automatically when inserting data. 
  2. Each record/entry (or row in terms of RDBMS tables) need not to have the same set of fields. We can create fields when creating the records.
  3. Allows nested data structures (eg: arrays, documents)
Different types of No-SQL data:

  1. Key-Value:
    1. A simple way of storing records with a key(from which we can lookup the data) and a value (can be a simple string or a JSON value)
    1234Nipuni
    1345"{Name: Nipuni, Surname: Perera, Occupation: Software Engineer}"

  2. Graph:
    1. Used when data can be represented as interconnected nodes.     
  3. Column:
    1. Uses a similar flat table structure used in RDBMSs, but keys are used in columns rather than in rows. 
    ID234345456567
    NameNipuniJohnSmithBob

  4. Document:
    1. Stored in a format like JNSON, XML.
    2. Each document can have a unique structure. (Document type is used when storing objects and support OOP)
    3. Each document usually has a specific key, which can use to retrieve the document quickly.
    4. Users can query data by the tagged elements. The result can be a String, array, object etc. (I have highlighted some of the tags in the sample document below.)
    5. A sample document data that stores personal details may look like below:
      1. {
Id”:”133”
Name”: “Nipuni”
Education”: [
{ “secondary-education”:”University of Moratuwa”}
, { “primary-education”: ”St.Pauls Girsl School”}
]
}

Possible application for No-SQL
  1. No-SQL commonly used in web applications, that involves dynamic data. As per the data type description above, No-SQL is capable of storing unstructured data. No-SQL can be a powerful candidate for handling big data. 
  2. There are many implementations available for No-SQL (eg:  CouchDB, MongoDB) that serve different types of data structures.
  3. No-SQL can use to retrieve full list (that may involve multiple tables when using RDBMS). Eg: Retrieving details of a customer in a financial company may have different levels of information about the customer (eg: personal details, transaction details, tax/income details). No-SQL can save all this data in a single entry with a nested data type (eg: document), which then can retrieve complete data set without any complex join operation. 
The decision on which scheme to use depend on the requirement of the application. Generally, 

  1. Structured, predictable data can be handled with →  RDBMS
  2. Unstructured, bid data, complex and rapidly changing data can manage with → No SQL (But there are different implementations for No-SQL that provide different capabilities. No-SQL is just a concept for database management systems.)


No-SQL with ACID properties



Relational databases usually guarantee ACID properties. ACID provides a rule set that guarantees to handle transactions keeping its data safe. It depend on which No-SQL implementation you choose, and how much the database implementation guarantee the ACID properties.



  • Atomicity - when you do something to change a database the change should work or fail as a whole. Atomicity is guaranteed in document wide transactions. Writes cannot be partially applies to an inserted document.
  • Consistency-  the database should remain consistent. This feature support depend on your chosen No-SQL implementation. As No-SQL databases mainly support distributed systems, consistency and availability may not compatible.

  • Isolation - If multiple transactions are processing at the same time they shouldn't be able to see mid-status. There are No-SQL implementations that support read/write locks to to support isolation mechanism. But this too depends on the implementation.
  • Durability - If there is a failure (hardware or software) the database needs to be able to pick itself back up. No-SQL implementations support different mechanisms (eg: MongoDB supports journaling. With journaling when you do an insert operation in mongoDB it keeps that in memory and insert into a journal. )

Limitations of No-SQL


  1. There are different DBs available that uses No-SQL, you need to evaluate and find out which fits your requirements the most.
  2. Possibility of duplication of data.
  3. ACID properties may not support for all the implementations.

I have mainly worked with RDBMS, and have a general idea about the No-SQL concept. There is are significant differences between RDBMS and No-SQL database management systems. The choice depends on the requirements of the application and the No-SQL implementation to use. IMHO the decision should take after a proper evaluation of the requirement, and the limitation that the system can afford.

1 comment:

  1. Good article Nipuni. Add my two cents to the story I have worked with Apache Jena Fuseki, which uses RDF for query, couple of years back. It was very fast even with phonetic search even for locale languages. It supported complex data structures and properties like inheritance. But only problem is solid HA solution like standard RDBMS offers. We came around with custom python script to achieve this but later on after I left the project got to know it failed to replicate the data between two instances after exceeding certain amount of records. This is where Postgres and Oracle have solid solution like data guard.

    So I 100% agree with you. Before you select we select NoSQL vs RDBMS we better do a proper evaluation on long run.

    Thanks for sharing. It is a good read.

    ReplyDelete