- Learning Apache Cassandra
- Mat Brown
- 466字
- 2021-07-23 20:34:50
Anatomy of a compound primary key
At this point, it's clear that there's some nuance in the compound primary key that we're missing. Both the username
column and the id
column affect the order in which rows are returned; however, while the actual ordering of username
is opaque, the ordering of id
is meaningfully related to the information encoded in the id
column.
In the lexicon of Cassandra, username
is a partition key. A table's partition key groups rows together into logically related bundles. In the case of our MyStatus application, each user's timeline is a self-contained data structure, so partitioning the table by user is a sound strategy.
Note
As a general rule, you should endeavor to only query one partition at a time for any core data access your application does. Cassandra stores the rows in each partition together, so queries within a partition are very efficient. Queries across multiple partitions, on the other hand, are expensive and should be avoided.
We call the id
column a clustering column. The job of a clustering column is to determine the ordering of rows within a partition. This is why we observed that within each user's status updates, the rows were returned in a strictly ascending order by timestamp of the id
. This is a very useful property, since our application will want to display status updates ordered by creation time.
Note
Is sorting by clustering column efficient?
Sorting any collection at read time is expensive for a non-trivial number of elements. Happily, Cassandra stores rows in clustering order, so when you retrieve them, it simply returns them in the order they're stored in. There's no expensive sorting operation at read time.
All of the rows that share the same primary key are stored in a contiguous structure on disk. It's within this structure that rows are sorted by their clustering column values. Because each partition is tightly bound at the storage level, there is an upper bound on the number of rows that can share the same partition key. In theory, this limit is about 2 billion total column values. For instance, if you have a table with 10 data columns, your upper bound would be 200 million rows per partition key.
Note
For further information on data modeling using compound primary keys, the DataStax CQL documentation has a good explanation at http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_compound_keys_c.html
Anatomy of a single-column primary key
Now that you understand the distinction between a partition key and a clustering column, you might be wondering which role the username
column plays in the users
table.
As it turns out, it's a partition key. All Cassandra tables must have a partition key; clustering columns are optional. In the users
table, each row is its own tiny partition; no row is grouped with any other.
- 大學(xué)計(jì)算機(jī)應(yīng)用基礎(chǔ)實(shí)踐教程
- Photoshop智能手機(jī)APP UI設(shè)計(jì)之道
- Effective Python Penetration Testing
- 人人都是網(wǎng)站分析師:從分析師的視角理解網(wǎng)站和解讀數(shù)據(jù)
- Java項(xiàng)目實(shí)戰(zhàn)精編
- Rust Essentials(Second Edition)
- Working with Odoo
- 微信小程序全棧開發(fā)技術(shù)與實(shí)戰(zhàn)(微課版)
- Mastering jQuery Mobile
- 從“1”開始3D編程
- Unity 5 Game Optimization
- Microsoft Windows Identity Foundation Cookbook
- Flutter for Beginners
- Perl 6 Deep Dive
- 嵌入式Linux與物聯(lián)網(wǎng)軟件開發(fā):C語(yǔ)言內(nèi)核深度解析