Infrastructure Monitoring with InfluxDB | Live Demonstration
NoSQL Database
NoSQL databases are a type of database system that provide a mechanism for data storage and retrieval, differing from traditional relational databases which are structured and require data to fit into predefined tables.
What is a NoSQL database?
NoSQL databases are a type of database system that provide a mechanism for data storage and retrieval, differing from traditional relational databases which are structured and require data to fit into predefined tables.
NoSQL databases emerged to address the scalability and flexibility issues present in relational databases, particularly in the context of large amounts of data and multi-user applications. They are specifically designed to handle unstructured data, and can be scaled across many servers easily.
While NoSQL databases provide solutions for scalability and flexibility, they are not a replacement for SQL databases but rather a complement, suited to particular types of projects. The choice between SQL and NoSQL databases depends on the specific data requirements of the application or project in question.
NoSQL database use cases
Content Management
Due to being schemaless, NoSQL databases are a good fit for CMS applications because they are flexible and easy to modify to fit new data formats.
Real time analytics and big data
NoSQL databases are a good fit for big data, because the majority of NoSQL databases are designed to scale horizontally out of the box. NoSQL databases are also a good choice for real time analytics use cases because their design allows them to handle higher write throughputs compared to relational databases.
IoT applications
IoT devices generate a massive amount of diverse, unstructured data. NoSQL databases can efficiently handle this kind of data and their distributed nature suits IoT’s geographical distribution, with shards being able to be located regionally to improve performance.
Social networks
Social networks have complex relationships that are best modeled with graph-based NoSQL databases. LinkedIn and Facebook both used graph databases heavily to power their applications.
NoSQL databases benefits
Scalability
One of the primary benefits of using a NoSQL database is that almost all of them will provide easy to use horizontal scaling features by default. This makes it easy to scale as your application grows compared to most relational databases which tend to rely on vertical scaling.
Performance
There are a variety of specialized NoSQL databases which are optimized for specific use cases, so you can choose which one will give you the best performance. In general NoSQL database also excel at handling higher write volume compared to relational databases.
Developer experience
Because of their flexible data models it is easier to iterate and make changes to applications compared to a relational database. Specialized NoSQL databases like graph or time series databases also tend to provide built-in features so you don’t have to reinvent the wheel with custom code for common tasks associated with certain types of data.
NoSQL database challenges
NoSQL databases offer a number of benefits, but that doesn’t mean they are magical. There are several potential downsides that you should keep in mind when choosing a database for your application.
Data model complexity
While the flexible data model of NoSQL databases is often a selling point, it can also become a major issue if an engineering team isn’t disciplined. It can result in issues coordinating on how the data model should look and where data should be located. It can also result in conflicting data or duplicate data.
Complex transaction support
For some applications where data reliability for advanced transactions is needed, NoSQL might not make sense. Most NoSQL databases will provide some form of simple transaction support, but not ACID transactions. And often for NoSQL DBs that do offer ACID transactions, enabling them will result in losing promised performance gains.
Maturity
While popular relational databases like MySQL and Postgres are battle tested due to having been around for decades, many NoSQL projects aren’t as mature. This can result in obscure bugs, less community support, and the tool ecosystem being less robust.
Data consistency
Database performance follows CAP theorem, which means you can only have 2 out of 3 when it comes to Consistency, Availability, or Partition tolerance. Most NoSQL databases sacrifice consistency to get better performance in other areas, which means you should only use this type of database for use cases where eventual consistency is acceptable.
Types of NoSQL databases
Key-value database
Key-value databases are the simplest form of NoSQL database, simply having a value which is tied to a unique key to identify the value. Key-value databases are generally used for tasks where quick lookups on simple data is needed like caching, session management, and user information.
Document database
Document databases are similar to key-value databases but provide JSON-like structure to the data being stored and more advanced querying capabilities. MongoDB and CouchDB are examples of document databases.
Columnar database
Columnar databases are used for OLAP and other analytics workloads. As the name suggests the data is formatted in columns on disks rather than rows like traditional relational databases. This results in better data compression and faster aggregations. Redshift and Clickhouse are examples of columnar databases.
Graph database
Graph databases represent data as nodes and edges to maintain relationships between data points. They often store data so that these relationships are maintained even on disk, so less RAM is required due to not needing to keep the entire dataset in memory. Queries for connected data points are far more efficient and easier to create compared to joining multiple tables in a SQL database. Graph databases are typically used for social networks and recommendation engines. Neo4J is an example of a graph database.
Time series database
Time series databases are optimized for storing time series data like application metrics, IoT sensor data, or financial data. They specialize at handling high volumes of write data, indexing data quickly so it can be queried soon after ingestion, and for efficient queries on time ranges. An example of a TSDB is InfluxDB.
In-memory database
In-memory databases store all data in RAM for fast queries and writes. Being stored in memory also allows them to provide unique data structures that aren’t possible with disk-based databases. The downside is that RAM is expensive for larger datasets. Memcached and Redis are examples of in-memory databases.
Search engine database
Search engine databases are designed for complex search queries and full text searches. They are able to index multiple types of data and provide different ranking and scoring algorithms to fine-tune how results are returned. Elasticsearch is an example of a search engine database.
Vector database
Vector databases are a newer type of database that are used to store and query vector data for AI applications. Common use cases are for similarity search, image recognition, and generative AI. An example of a vector database is Milvus.
Wide column database
These can be considered a subset of columnar databases, these databases store data in column families which allow them to be more versatile than purely column-oriented databases. Cassandra and HBase are examples of wide column databases.
Multi-modal database
Many NoSQL databases are technically multi-modal, meaning they support different types of data models and indexes. Over time many specialized databases add more functionality and are able to support more use cases as a single database.
SQL vs NoSQL databases for time series data
If you are working with time series data, one key decision to make when storing it is what type of database to use. In the past it often made more sense to go with a NoSQL database for the scalability advantages that most traditional relational databases didn’t have. NoSQL databases also make life easier for developers by allowing for data to be written without having to define a schema, allowing fields to be added or removed easily. There are also NoSQL databases that are explicitly designed to handle time series data, allowing them to have improved performance compared to a more general purpose relational database designed for OLTP workloads.
While the factors above are still somewhat true when it comes to deciding between a SQL and NoSQL database for time series data, in some cases you can get the best of both worlds. InfluxDB for example allows data to be queried using SQL while also providing things like scalability and time series data specific optimizations so your queries are fast and write throughput is superior to relational databases.
FAQs
NoSQL vs. SQL: What’s the difference?
Relational databases and NoSQL have a few key differences, with trade offs made by both types of databases. Which is better will depend on your specific use case.
- Data model - SQL databases use a relational model which organizes data into a table format with rows and columns. NoSQL databases have a large number of different data models to structure data.
- Scalability - Relational databases tend to scale vertically, meaning the hardware size of the server the database is running on is increased. This can limit scalability long term. NoSQL databases tend to be designed for horizontal scaling across commodity hardware.
- Data integrity - SQL databases support ACID transactions which guarantee data integrity at the cost of performance. NoSQL databases typically use BASE, which involves loosening consistency guarantees in return for performance.
- Query language - Relational databases use SQL as a query language, which is powerful but can become complex for certain types of queries. NoSQL databases tend to provide query languages that are simpler and easier to use for non database experts.
What are LSM trees and why do NoSQL databases use them?
LSM Trees, or Log-Structured Merge-Trees, are a type of data structure often used in NoSQL databases to provide efficient, write-heavy operations. They were designed with the goal of reducing the cost of random writes to disk, which are more time consuming than sequential writes or reads.
An LSM Tree consists of two or more components: a memory component, often called a memtable, and one or more disk components. When a write operation occurs, the data is first written to the memtable. This write is typically fast because it’s performed in memory.
The LSM Tree design is used in several NoSQL databases, such as Cassandra and LevelDB, because it provides high write throughput, efficient storage space utilization, and tunable read performance. It’s especially useful in write intensive applications or situations where large amounts of data need to be written to disk quickly and frequently.
Take charge of your operations and lower storage costs by 90%
Get Started for Free Run a Proof of ConceptNo credit card required.