Row Oriented DB vs Column Oriented DB — Read vs Write Performance

karthik loganathan
3 min readJul 7, 2020

Database is crucial part of any IT/ITES Infrastructure inspective of the domain. Data is the asset today , which falls under two main category. Structured and Unstructured. Structured Data had a predefined data model and schema.Eg Credit card number, Date, Phone number etc. Relational Databases are best suited for storing structured data. No SQL Databases are suited for Unstructured Data. Eg. Text , Image , Documents etc. No SQL Databases further divides into following four category ( Key Value , Columnar , Document , Graph). This blog talks mostly about columar DBs.

Row Oriented DBs:

Schema/Data model is defined in advance. Consider Car Purchase use case. 2 Tables describes customer and car info which is reffered in the Order table. Schema is pre-defined and Data insertion cannot violate schema rules.

CustomerInfo Table:
CustomerID
Name
Phone
City
State
Country.

CarInfo Table:
CarID
Make
Model
Cost

OrderInfo:
OrderID
CarID
Purchase Data

Releational Databases are good for OLTP ( Online Transaction Processing) workload , meaning frequent updation/writes happens on the DB. Lets see how it aids in updation. This is how the Data is stored in the disk.Each row occupies a block.

CarID1 Make1 Model1 Cost1, CarID2 Make2 Model2 Cost2
………….Block A……. ……………….Block B …………..

When a new row gets added , a new block is created and existing blocks are not disturbed. Hence row addition is quicker.

CarID1 — — Model1 , CarID2 — - Model2, CarID3 — -Model3
………….Block A……………Block B……. ……….Block C…………..

Downside:

Data compression is not easy becasue each Block stores different type of data. Here (int , year,string,Float) — (carid,make,model,cost)

Read operations are costlier. Eg. Get total cars made in year 2020. This query will have to access all the Blocks to get the results. If the tables have 2K records , 2K blocks are accessed to retrive data. Not all blocks are stored sequentially , data could span across different disks.

Column Oriented DB:

Column Oriented DBs falls under No-SQL DB. These are best suited for OLAP (Online Analytical Processing)workload. Gives faster results for read queries .Writes will take more time than read. Analytics and Data Ware housing workloads requires quicker read time and writes can be slower.

This is how car is stored in columnar model,

CarID1 CarID2, Make1 Make2, Model1 Model2
…..Block-A……….Block B…………Block C………

New record insertion will result in following.

CarID1 CarID2 CarID3, Make1 Make2 Make3, Model1 Model2 Model3
…..Block-A…………………….Block B……………..…Block C………

When a new record gets added , write happens on 3 Blocks , hence it is slower.
Whereas , reads will be quicker. Consdier same query to get the total cars made in 2020. It just needs to access one Block(Make — Block B )to get this info. In case query want to list the car Model in addition to aggregation(sum) of cars made in 2020. It would need to access only 2 Blocks ( Model -Block C and Make — Block B ). In case of row oriented DB , entire row got to be fetched , stored in memory and then proceessed.

Compression is easy here , becasue every blocks holds same data type.

Examples:

Row Oriented DB — MS SQL , My SQL , Postgres

Column Oriented DB — Redshift , SnowFlake , Cosmos

--

--