- Learning Apache Cassandra
- Mat Brown
- 1069字
- 2021-07-23 20:34:49
Working with status updates
Now that we've got our status updates table ready, let's create our first status update:
INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ( 'alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02, 'Learning Cassandra!' );
This will look pretty familiar; we specify the table we want to insert data into, the list of columns we're going to provide data for, and the values for these columns in the given order.
Let's give bob
a status update too, by inserting the following row in the user_status_updates
table:
INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ( 'bob', 97719c50-e797-11e3-90ce-5f98e903bf02, 'Eating a tasty sandwich.' );
Now we have two rows, each identified by the combination of the username
and id
columns. Let's take a look at the contents of our table using the following SELECT
statement:
SELECT * FROM "user_status_updates";
We'll be able to see the two rows that we inserted, as follows:

Note that, as we saw in the users
table, the rows are not returned in lexical order of username
; indeed, they're in the same order as the user records themselves are. Recall from the Paginating through results section of Chapter 2, The First Table, that the username
column is ordered by an internally generated token, which is deterministic but meaningless from the application's perspective.
Extracting timestamps
As we previously mentioned, Cassandra has special capabilities for the timeuuid
type, which includes extracting the timestamp that's encoded in these UUIDs. We can see this in action using the DATEOF
function:
SELECT "username", "id", "body", DATEOF("id") FROM "user_status_updates";
The DATEOF
function instructs Cassandra to return a result column containing the timestamp at which the given column's UUID value was created. We now have access to information encoded in the id
column that was previously obscure:

If you're following along in your CQL shell, you'll notice that the dates do not tell you when the rows were created, but rather when the UUIDs in the id
column were generated. Since we're using UUIDs that I generated when writing this book, we'll see that the creation timestamp shows an older time.
While the DATEOF
output is very readable, the timestamps only offer precision at the second level. For a more precise representation of the timestamp at which the UUIDs were generated, use the UNIXTIMESTAMPOF
function instead:
SELECT "username", "id", "body", UNIXTIMESTAMPOF("id") FROM "user_status_updates";
The UNIXTIMESTAMPOF
function returns the timestamp represented as the number of milliseconds since January 1, 1970 at midnight UTC:

Looking up a specific status update
In our status update application, we'll want to allow direct linking to status updates, in which case we'll need to be able to retrieve a specific status update from the table. So far, we've issued an open-ended SELECT
statement to see all the rows we've inserted in user_status_updates
, but we haven't seen how to return one particular row.
Since user_status_updates
has a compound primary key, we need new CQL syntax to allow us to specify values for multiple columns in order to form a unique identifier. CQL provides the AND
construct for this purpose:
SELECT * FROM "user_status_updates" WHERE "username" = 'alice' AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02;
By specifying both the username
and the id
, we identify exactly one row:

We now have all the tools we need to interact with the user_status_updates
table in the same way as we interact with the users
table; let's move on and explore some of the things that make a compound primary key table different.
Automatically generating UUIDs
In the status updates we've created so far, we've used predetermined UUIDs that were generated on May 29, 2014. For this reason, the timestamp encoded in the UUIDs doesn't really tell us anything useful about when the rows were created. However, in the general case, we would like to encode useful data in the UUID: in particular, the timestamp at which the row itself was created.
Libraries that generate Version 1 UUIDs are available for just about any programming language, but Cassandra also gives us a built-in CQL function to generate a UUID from the current time, the NOW
function. Let's use that function to insert a few new status updates:
INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ('alice', NOW(), 'Alice Update 1'); INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ('bob', NOW(), 'Bob Update 1'); INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ('alice', NOW(), 'Alice Update 2'); INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ('bob', NOW(), 'Bob Update 2'); INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ('alice', NOW(), 'Alice Update 3'); INSERT INTO "user_status_updates" ("username", "id", "body") VALUES ('bob', NOW(), 'Bob Update 3');
Instead of explicitly specifying a UUID constant for the id
field, we used the NOW
function to generate a new and unique UUID for each row we inserted.
Note
While the NOW
function is quite convenient, particularly in the CQL shell, it comes with one major downside. Since a CQL INSERT
does not give any feedback on the results of the operation, you won't actually know what UUID the NOW
function generates. Sometimes this is fine, but in many cases your application would want to perform further business logic that requires knowing the primary key of the record that was just created. In these cases, it's better to use a library to generate UUIDs at the application level and provide them explicitly as literals in the INSERT
statement.
Now that we've got a handful of rows in the user_status_updates
table, let's take a look at its contents:
SELECT "username", "id", "body", UNIXTIMESTAMPOF("id") FROM "user_status_updates";
As we did previously, we'll ask Cassandra to return the millisecond-precision timestamp of the UUIDs in the table along with the data in all columns. Let's take a look at the results:

We notice a couple of interesting things here. First, we can see that results are grouped by username: all the status updates of bob
appear first, followed by all the status updates of alice
. This is despite the fact that we interleaved the status updates of alice
and bob
when we inserted them.
Second, within each user's status updates, the updates are returned in ascending order of the timestamp of the row's id
column. Looking at the rightmost column in the results, we see that the timestamps monotonically increase for each user. This is no coincidence; the id
column determines the ordering of rows in the user_status_updates
table, and since it's a timeuuid
column, the timestamp encoded in the UUID determines the semantic ordering of the rows.
- Instant Node Package Manager
- Practical UX Design
- 數據結構與算法JavaScript描述
- PLC編程及應用實戰
- Mastering RStudio:Develop,Communicate,and Collaborate with R
- UML 基礎與 Rose 建模案例(第3版)
- Express Web Application Development
- Python全棧數據工程師養成攻略(視頻講解版)
- SQL Server實用教程(SQL Server 2008版)
- C語言程序設計實踐
- IBM RUP參考與認證指南
- Socket.IO Cookbook
- PHP從入門到精通(第7版)
- Python程序員面試算法寶典
- jBPM6 Developer Guide