SQL vs NO-SQL

Posted by Yanchao MURONG on 2022-07-16

SQL vs NOSQL

SQL

  • ACID properties
    • atomicity
      • all or nothing !
    • consistency
      • data consistent (payment transaction, guaranteed by atomicity and business logic)
    • isolation
      • transactions are isolated between each other (isolation levels)
        • read uncommitted (dirty read, unrepeated read, phantom read)
        • read committed (unrepeated read, phantom read)
        • repeatable read (phantom read)
        • serializable (lowest performance)
      • possible problems
        • dirty read: transaction A reads uncommited transactions B's modification
        • unrepeated read: transaction A reads different results (changed by transaction B)
        • phantom read: transaction A changed a whole table while transaction B add one row, then transaction A finds that we had one unchanged added row
    • durability
      • once committed, in database forever
  • Referential Integrity (Normalization)
  • Transaction based

NOSQL

structured/semi-structure

Column Based (OLAP)

  • usecases: suitable for OLAP system, message history
  • examples: hbase, bigtable, duckdb

KV

  • really high performance
  • usecases: realtime ranking, fans followers, shopping cart, session
  • examples: redis, cassandra

Dictionary

  • usecases: documents, xml, json
  • examples: mongodb, couchdb
  • only one document atomicity, only read committed isolation

Index

  • inverted index: document searching/ranking
  • examples: elastic search, solr

Graph

  • advantage: full acid support
  • usecases: social network, recommendation
  • examples: neo4j, titan

Selection of SQL and NoSQL databases

  • Data volume
  • Concurrency
  • Real-time performance
  • Consistency requirements
  • Read and write distribution and type
  • Security
  • Operation and maintenance costs

Common software system database selection references are as follows.

Internal use of management-oriented systems

Such as operation system, data volume is small, concurrency is small, the first choice to consider relational

High traffic system

Such as e-commerce single product page, the back office to consider the choice of relational, the front office to consider the choice of memory type

Log type system

Consider columnar type for raw data, and inverted index for log search

Search system

For example, the site search, non-universal search, such as product search - the back office to consider the choice of relational - the front office to consider the choice of inverted index

Transaction-based systems

Such as inventory, transactions, bookkeeping, consider choosing relational + cache + consistency protocols

Off-line computing

Consider choosing columnar or relational type for large amount of data analysis

Real-time computing

Such as real-time monitoring, you can consider choosing in-memory or columnar database

In design practice, based on the requirements, business-driven architecture, regardless of the choice of RDB/NoSQL/DRDB, must be demand-driven, the final data storage solution is bound to be a comprehensive set of various trade-offs.