Some Database Concepts

Yogesh Bhople
3 min readSep 6, 2017

CAP Theorem

For better understanding consider distributed system.

  • Consistency — One data can have multiple copies across the servers. In this case user will get same copy of data regardless of which server is responding. Across the cluster, user will get the same data.
  • Availability — User will get the data for request regardless of latest. User will get the data even if a node in cluster is goes down.
  • Partition Tolerance — System/Cluster continues to work despite of communication break between nodes.

One can not give the guarantee for network failure so network failure has to be handled (Partition Tolerance)

Now decision has to make that whether we have to give preference to Availability or Consistency.

CA — This is impossible to achieve.

CP — System will return error or time out when any node goes down. However, data will be consistent across the nodes.

AP — System will process the request and will provide most recent data available despite of nodes can not communicate with each other.

ACID

For better understanding consider RDBMS.

  • Atomicity — This ensures that there is no partial data changes. If any transition have multiple data changes and one fails then whole transition will fail. We can say it is zero or one in binary. There is no partial commit.
  • Consistency — This ensures any changes in database from one valid state to another. This is achieved by applying constraints on transition like precondition, transformation, post-condition e.g. triggers, constraints etc.
  • Isolation — This property ensures the multiple transition can perform concurrently without inconsistent data in database. Other transition on data will no be visible to current transition.
  • Durability — This property ensures after successful commit data will persist and safe against errors, crashes or any other

Network partitioning

The network between server get disconnected due to network failure. For more detail you go through Network Partitioning

Relational database Vs Non Relational database

Relational Database

Pros

  • These databases support ACID property.
  • Relational databases stores the structured data.
  • In relationship of data, there are constraints.
  • These database have data integrity.

Cons

  • As it stores structured data, these database have problem with semi structured data.
  • Scaling is challenging in Relational Databases. For scaling, you use sharding.
  • Normalization is affect the speed.

Non Relational Databases

Pros

  • The main advantage of this database is scaling.
  • Schema is not needed.
  • Open Source
  • This database works with unstructured and semi-structured data.
  • High availability.

Cons

  • Eventual consistency (BASE).
  • Data is not normalized
  • Data integrity

Sharding

Shard means small part of big. In database, it means breaking your big database into small databases(manageable) which can be located on different servers.It is the horizontal partitioning in case of RDBMS.

Partitioning Methods in RDBMS

Horizontal Partitioning in database(RDBMS)

Storing rows in different tables. For example, list of product records having price 0–100 in one table and 100–200 in another table , so on

Vertical Partitioning in database(RDBMS)

It is storing some column in one and some (or least used) in another tables which can be reside on another physical location.

Normalization in RDBMS

High availability means what ? How?

In traditional RDBMS, there is master and replica architecture. If master fails then replica of master starts responding. However, in practice there are chances of single point failure.

NoSQL type databases are made by node and cluster which is master-less architecture. Any server from cluster can respond to request.

--

--