JDB电子变脸

書名： Learning Apache Cassandra
作者名： Mat Brown
本章字數： 1069字
更新時間： 2021-07-23 20:34:49

Working with status updates

Now that we've got our status updates table ready, let's create our first status update:

INSERT INTO "user_status_updates"
("username", "id", "body")
VALUES (
  'alice',
  76e7a4d0-e796-11e3-90ce-5f98e903bf02,
  'Learning Cassandra!'
);

This will look pretty familiar; we specify the table we want to insert data into, the list of columns we're going to provide data for, and the values for these columns in the given order.

Let's give bob a status update too, by inserting the following row in the user_status_updates table:

INSERT INTO "user_status_updates"
("username", "id", "body")
VALUES (
  'bob',
  97719c50-e797-11e3-90ce-5f98e903bf02,
  'Eating a tasty sandwich.'
);

Now we have two rows, each identified by the combination of the username and id columns. Let's take a look at the contents of our table using the following SELECT statement:

SELECT * FROM "user_status_updates";

We'll be able to see the two rows that we inserted, as follows:

Note that, as we saw in the users table, the rows are not returned in lexical order of username; indeed, they're in the same order as the user records themselves are. Recall from the Paginating through results section of Chapter 2, The First Table, that the username column is ordered by an internally generated token, which is deterministic but meaningless from the application's perspective.

Extracting timestamps

As we previously mentioned, Cassandra has special capabilities for the timeuuid type, which includes extracting the timestamp that's encoded in these UUIDs. We can see this in action using the DATEOF function:

SELECT "username", "id", "body", DATEOF("id")
FROM "user_status_updates";

The DATEOF function instructs Cassandra to return a result column containing the timestamp at which the given column's UUID value was created. We now have access to information encoded in the id column that was previously obscure:

If you're following along in your CQL shell, you'll notice that the dates do not tell you when the rows were created, but rather when the UUIDs in the id column were generated. Since we're using UUIDs that I generated when writing this book, we'll see that the creation timestamp shows an older time.

While the DATEOF output is very readable, the timestamps only offer precision at the second level. For a more precise representation of the timestamp at which the UUIDs were generated, use the UNIXTIMESTAMPOF function instead:

SELECT "username", "id", "body", UNIXTIMESTAMPOF("id")
FROM "user_status_updates";

The UNIXTIMESTAMPOF function returns the timestamp represented as the number of milliseconds since January 1, 1970 at midnight UTC:

Looking up a specific status update

In our status update application, we'll want to allow direct linking to status updates, in which case we'll need to be able to retrieve a specific status update from the table. So far, we've issued an open-ended SELECT statement to see all the rows we've inserted in user_status_updates, but we haven't seen how to return one particular row.

Since user_status_updates has a compound primary key, we need new CQL syntax to allow us to specify values for multiple columns in order to form a unique identifier. CQL provides the AND construct for this purpose:

SELECT * FROM "user_status_updates"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02;

By specifying both the username and the id, we identify exactly one row:

We now have all the tools we need to interact with the user_status_updates table in the same way as we interact with the users table; let's move on and explore some of the things that make a compound primary key table different.

Automatically generating UUIDs

In the status updates we've created so far, we've used predetermined UUIDs that were generated on May 29, 2014. For this reason, the timestamp encoded in the UUIDs doesn't really tell us anything useful about when the rows were created. However, in the general case, we would like to encode useful data in the UUID: in particular, the timestamp at which the row itself was created.

Libraries that generate Version 1 UUIDs are available for just about any programming language, but Cassandra also gives us a built-in CQL function to generate a UUID from the current time, the NOW function. Let's use that function to insert a few new status updates:

INSERT INTO "user_status_updates" ("username", "id", "body")
VALUES ('alice', NOW(), 'Alice Update 1');
INSERT INTO "user_status_updates" ("username", "id", "body")
VALUES ('bob', NOW(), 'Bob Update 1');
INSERT INTO "user_status_updates" ("username", "id", "body")
VALUES ('alice', NOW(), 'Alice Update 2');
INSERT INTO "user_status_updates" ("username", "id", "body")
VALUES ('bob', NOW(), 'Bob Update 2');
INSERT INTO "user_status_updates" ("username", "id", "body")
VALUES ('alice', NOW(), 'Alice Update 3');
INSERT INTO "user_status_updates" ("username", "id", "body")
VALUES ('bob', NOW(), 'Bob Update 3');

Instead of explicitly specifying a UUID constant for the id field, we used the NOW function to generate a new and unique UUID for each row we inserted.

Note

While the NOW function is quite convenient, particularly in the CQL shell, it comes with one major downside. Since a CQL INSERT does not give any feedback on the results of the operation, you won't actually know what UUID the NOW function generates. Sometimes this is fine, but in many cases your application would want to perform further business logic that requires knowing the primary key of the record that was just created. In these cases, it's better to use a library to generate UUIDs at the application level and provide them explicitly as literals in the INSERT statement.

Now that we've got a handful of rows in the user_status_updates table, let's take a look at its contents:

SELECT "username", "id", "body", UNIXTIMESTAMPOF("id")
FROM "user_status_updates";

As we did previously, we'll ask Cassandra to return the millisecond-precision timestamp of the UUIDs in the table along with the data in all columns. Let's take a look at the results:

We notice a couple of interesting things here. First, we can see that results are grouped by username: all the status updates of bob appear first, followed by all the status updates of alice. This is despite the fact that we interleaved the status updates of alice and bob when we inserted them.

Second, within each user's status updates, the updates are returned in ascending order of the timestamp of the row's id column. Looking at the rightmost column in the results, we see that the timestamps monotonically increase for each user. This is no coincidence; the id column determines the ordering of rows in the user_status_updates table, and since it's a timeuuid column, the timestamp encoded in the UUID determines the semantic ordering of the rows.

官术网_书友最值得收藏!

Learning Apache Cassandra

Working with status updates

Extracting timestamps

Looking up a specific status update

Automatically generating UUIDs

Note