Cassandra is an Open-sourced distributed database management system developed at Facebook and implements a Dynamo-style replication model with no single point of failure, it's an Apache Software. In short Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tunably consistent, column-oriented database. In present time it is used in some of the most popular sites on the web like Facebook.
Cassandra has the following features:
- Distributed And Decentralized
Cassandra is distributed, it means that it is capable of running on multiple machines at a time. It is made not only for many different machines, but also for optimizing performance across multiple data center racks. You can write data to anywhere in the cluster and Cassandra will get it.
Decentralized means there is no single point of failure. It has two advantages; first it's simpler to use maste/ slave which helps user to avoid outages and second it's easier to operate and maintain because all nodes are the same.
- Elastic Scalability
If you add a new node you don't need reconfiguration of the entire cluster, don't have to restart your process and don't have to change your application queries. It finds a new machine automatically, adds it and starts sending work on it.
- High Availability and Fault Tolerance
If one data center is failed in any condition you can replicate data to multiple data centers to improve your performance.
- Tuneable Consistency
Consistency means that it writes and reads at a tunable level of consistency. It has two types of consistency, strong and eventual. First the ability to guarantee an update to all locations and second is that the clientis acknowledged as soon as part of the cluster acknowledges the write. Cassandra offers choices, use one andyou can even mix them.
- Row-Oriented
Cassandra stores data in multidimensional hash tables, which means you don't have to decide ahead of time precisely what your data structure must look like, or what fields your records will need. This can be useful if you're in startup mode and are adding or changing features with some frequency. It is also attractive if you need to support an Agile development methodology and aren't free to take months for up-front analysis.
- High Performance
Cassandra is designed to use the full advantage of multiprocessor/multicore machines, and to run across in multiple data centers. We have the option to add more servers and we can also maintain all of Cassandra's desirable properties without having the performance sacrificed.
Who is using Cassandra?
- Facebook still uses it for inbox search, though they are using a proprietary fork.
- Digg uses it for its primary near-time data store.
- Rackspace uses it for its cloud service, monitoring, and logging.
- Reddit uses it as a persistent cache.
- Cloudkick uses it for monitoring statistics and analytics.
- Ooyala uses it to store and serve near real-time video analytics data.
- SimpleGeo uses it as the main data store for its real-time location infrastructure.
And Many other use the Cassandra.
Cassandra Data Model
Cassandra data model is based on column-oriented data models and it consists of columns, rows, column families, and keyspace.
- Columns
A column is the basic unit in the Cassandra data model. A column has a value, name, and a timestamp. In other relational databases user specify the structure of the table up front by assigning all of the columns in the table name, after that usser write data and simply supply the values for the predefined manner. But in Cassandra you do not need to define columns up front you only define column families and if you need to keyspace then the user can write data without defining the columns anywhere.
- Rows
A row is a collection of columns labeled with a name. But Cassandra uses wide rows containing automatically generated names (like UUIDs or timestamps).
- Column families
A column family is a collection of rows. Cassandra is schema-free. Users are free to add any column to any column family at any time according to user needs. A column families have two attributes a name and a comparator. The comparator value indicates how columns will be sorted when they are returned to you in a query-according to long, byte, UTF8, or other ordering.
- Keyspace
It is a set of many column families. It is a mention replication factor and replica placement strategy.