Friday, September 20, 2013

Relational or non relational databases? That is the question.

We’ve all been there. Every system designer comes across one question…and every database (DB) admin is faced with consequences from bad decisions. Namely, how do we decide on using databases, which types should we go with, and how can we make sure we make the best of what we have by optimization? Here, we’ll review and answer these questions, and invite your feedback at the end. Let us know what works FOR YOU in terms of database programming and share your tips or experience in choosing DB structures in the comments section.

WHAT IS A DATABASE?

Put in simple terms, databases are nothing more than data containers. They organize data in datasets of rows and columns. They also give the user the ability to search for particular data based on different properties of data. Databases come in two major denominations: relational and non-relational. Going into definition details for each of these is way beyond the scope of this blog post, so we will only mention the primary differences.

1. Relational databases. A relational database manipulates only tables and the result of all operations are also tables. The tables are sets, which are themselves sets of rows and columns. You can view the database itself as a set of tables.

2. Non-relational databases. The non-relational flavor of databases, on the other hand, are based on a “navigational” model: a hierarchy, a linked list, a B-Tree, etc. It’s common to refer to these as ISAM (Indexed Sequential Access Method) Databases.

YOU NEED A DATABASE. SO, WHAT NEXT?

So, having concluded a definite need for a database how do we chose the best one for our applications? Well, first of all it mostly depends on the nature of the information you will store, as well as how it will be accessed.

Store inventories work great as relational databases. For example, managing a store inventory list would be a perfect (and a very simple) example of when to use a relational model DB. You would have a table of items offered by the store, a second table of stores available in the area and a final table which would keep info on the (in-store) quantity of items in each store. Essentially, this is a normalized database design, meaning that each piece of unique data is stored only once and then referenced as needed via an index. The important thing to keep in mind here (in this type of setup) is that the data is limited to how much it can grow, meaning that there is a limited number of stores as well as a limited number of items these stores can offer their customers. Hence, the choice of a relational database is quite a sound choice.

Websites do better with non-relational databases. Now when it comes to the internet, the web introduces a new scale for applications in terms of:


  • Concurrent users (millions of reqs/second)
  • Data (petabytes generated daily ( think google, facebook, etc.)
  • Processing (all this data needs processing)
  • Exponential growth (surging unpredictable demands)


Namely, web sites with very large traffic have no way to deal with these issues using existing relational database systems, even using those high end query crunchers like Oracle, MS SQL, Sybase, etc. The reason for this is simple. Related data is distributed amongst different tables which need to be joined in order to fetch that data. These joins perform more slowly the larger the datasets grow. Furthermore, distributing this data to multiple servers makes the joining of related data even slower to fetch. Sure, you can throw all your data to a single table and repeat if need be, but that kinda defeats the point of a relational database system. Popular web sites that have faced these sorts of issues include Google, Yahoo, Facebook, Twitter, Amazon, etc. Basically, web sites dealing with high traffic, massive data, large user base and user generated content.

Compared to other large scale systems such as telco applications, the above mentioned are FREE apps and can therefore compromise on data integrity and consistency. To be blunt, they wont be sued if someone hasn’t received:

his friends status update on time (FB, Twitter) or
the desired result of a search (google)
The solution for handling this type of web scaling lies in non relational database systems. Although there are a number of systems available freely for download (Redis, CouchDB, MongoDB) the top tier companies like Google, Facebook and Amazon went ahead and built their own custom systems (optimized for their own services) from scratch namely, BigTable, Cassandra and Dynamo respectively. The benefits of these systems include the following:


  • They are massively scalable
  • Highly available, decentralized and don’t have a single point of failure
  • They employ transparent sharding
  • Parallel processing
  • Built in mechanisms for automatic conflict resolution

However, these systems do come with a price, which was proven by scientists at MIT in 2002. To wit, non relational systems must compromise on at least one of the following optimizations: Consistency, Availability and/or Partition tolerance. Under no circumstances can non-relational databases be optimized for all three of the above benefits.

On one of my next "programming tips of the week", I will go deeper into real world examples of specific optimization methods given various types of problems requireg databases are their foundations.

Tuesday, September 10, 2013

"Curiosity killed the cat" - yeah right!

This is a proverb which has been buzzing in and out of my mind for the past two and something weeks. I mean, who on earth did curiosity actually kill? And additionally which cat did it kill so as to make the proverb so popular?

I was trying to find answers to these questions on the internet and came across the earliest recorded references (apparently) of the proverb. Believe it or not the proverb is initially attributed to a guy called Ben Jonson, a British playwright, who wrote something similar in his play from 1598 called Every Man In His Humour.

The problem I have with curiosity killing a cat, is the fact that the proverb sort of sends out an incorrect message. Namely, being too curious can be detrimental and dangerous to your health. Let's think about that for second. How can that be true, when curiosity has brought mankind to where it is now in terms of evolution? Curiosity led us to create the first tools. It showed us that the planet was round and revolving around its axis. Curiosity brings solutions to problems. It brings enlightenment, knowledge and power. Curiosity elevated us to the top of the food chain and on 26th of November 2011, Curiosity landed on our neighboring planet Mars on a crater called Aeolis Palus, just 2.4KM from its specifically predefined target (which considering that it made a 563,000,000KM journey is a pretty good approximation).

So fuck no! Saying that curiosity killed the cat, is analogous to saying something stupid like "global warming is a result of the sun being too near to our planet" or “cancer kills people because we are not praying hard enough”. In fact, curiosity is probably the very reason why our fury feline friends have made it so far as to guarantee the survival of their species, by stirring up human love and compassion towards them.

There are three morals to this story.

The first, is that I believe we (meaning us, humans) are lacking (or have dismissed as unimportant) curiosity! We can never have enough of it. I will try to quantify this reasoning using numbers: lets say, that when man decided to make his first journey to the moon (or for those pompous skeptics who don't believe that ever happened: when we discovered the earth was round), we had a curiosity quotient of 5 for example (I’m just throwing these numbers out of my beautiful and humble cerebral cortex) . For the advancements we have today and those that we imagine for tomorrow we would need a cutiosity quotient of 500. Let's try to always keep this in mind. So, by all means, as they say, question EVERYTHING! Indeed, be curious!

My second point (and a discussion for my next blog post ) is that curiosity is probably a prerequisite for intelligence! Think about it when you try to answer the question of what is intelligence? Well, to begin with its a function of curiosity. Being able to represent curiosity as a mathematical structure would go a long way to solving a lot of roadblocks in artificial intelligence. For those tech savy minds, here is something to think about: can you imagine a software application, which is curious about the output its internal functions produce with regards to either their input or some other sensory input from the environment?

Finally a thought for conclusion. Curiosity did NOT kill the cat! If anything did, it was probably incomprehention or unawareness of the given situation. This is something which evidently goes on par with the LACK of curiosity!