位置:首頁 > 大數據教學 > Cassandra教學 > Cassandra教學

Cassandra教學

Apache Cassandra是一個高度可擴展,高性能的分布式設計,在許多商用服務器來處理大量的數據,提供高可用性,無單點故障的數據庫。它是一種NoSQL數據庫。讓我們先了解什麼是NoSQL數據庫。

NoSQL數據庫

NoSQL數據庫(有時稱為未僅SQL)是它提供存儲和檢索數據而不是關係數據庫中所用的片狀關係的其他機製的數據庫。這些數據庫是無模式,支持簡單的複製,有簡單的API,最終一致,可以處理大量的數據。

NoSQL數據庫的主要目的是

  • 設計簡單,
  • 橫向擴展
  • 更好地控製可用性。

NoSQL數據庫使用不同的數據結構,相比於關係數據庫。這使得NoSQL的速度更快一些操作。一個給定的NoSQL數據庫的適用性取決於它必須解決的問題。

NoSQL VS關係型數據庫

下表列出了一個NoSQL數據庫與關係數據庫的區分點。

關係數據庫 NoSql數據庫
支持強大的查詢語言 支持非常簡單的查詢語言
具有固定的模式 冇有固定的模式
如下ACID(原子性,一致性,隔離性和持久性) 隻有“最終一致”
支持事務 不支持事務

此外Cassandra,以下的NoSQL數據庫是很受歡迎:

  • Apache HBase - HBase是一個開源,非關係型,分布式數據庫,以穀歌BigTable為藍本,並用Java編寫。它被開發為Apache Hadoop項目的一部分,並且運行在HDFS的頂部,為Hadoop提供了BigTable般的能力。

  • MongoDB - MongoDB是一個跨平台的麵向文檔的數據庫係統,主張使用動態架構使數據整合在某些類型的應用程序更容易和更快JSON類的文檔,避免使用傳統的基於表格的關係型數據庫結構。

Apache Cassandra 是什麼?

Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database), for managing very large amounts of structured data spread out across the world. It provides highly available service with no single point of failure.

Listed below are some of the notable points of Apache Cassandra:

  • It is scalable, fault-tolerant, and consistent.

  • It is a column-oriented database.

  • Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.

  • Created at Facebook, it differs sharply from relational database management systems.

  • Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.

  • Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.

Features of Cassandra

Cassandra has become so popular because of its outstanding technical features. Given below are some of the features of Cassandra:

  • Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement.

  • Always on architecture - Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure.

  • Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time.

  • Flexible data storage - Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need.

  • Easy data distribution - Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers.

  • Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID).

  • Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.

History of Cassandra

  • Cassandra was developed at Facebook for inbox search.
  • It was open-sourced by Facebook in July 2008.
  • Cassandra was accepted into Apache Incubator in March 2009.
  • It was made an Apache top-level project since February 2010.