官术网_书友最值得收藏!

Multiple clustering columns

Clustering columns are not limited to one field as specified before. Let's take a look at how multiple clustering columns work and facilitate data ordering. To illustrate this, we will recreate our status updates table so that it is clustered by the date and time when the user updated their status:

CREATE TABLE "user_status_updates_by_datetime" ( 
"username" text,
"status_date" date,
"status_time" time,
"body" text,
PRIMARY KEY ("username", "status_date", "status_time")
);

We have created a new table user_status_updates_by_datetime as shown next:

  • Partition key: username, which is a text field.
  • Clustering columns: status_date and status_time. Rows for a particular username are clustered by both columns, first by status_date and then by status_time.

Since our primary goal is to figure out how a primary key with multiple clustering columns behaves, we will be inserting updates for a single user, alice.

Let's insert a few records as follows:

INSERT INTO "user_status_updates_by_datetime" ("username",  
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-18', '08:30:55.123', 'Alice Update 1');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-18', '14:40:25.123456789', 'Alice Update 2');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-19', '08:25:25', 'Alice Update 3');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-21', '08:35:55.123456', 'Alice Update 4');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-21', '14:30:15.123', 'Alice Update 5');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-23', '14:50:45.123456', 'Alice Update 6');

You will notice that status_date and status_time, which are date and time data types respectively, are inserted as strings similar to text data type.

As mentioned earlier, date should be of the format YYYY-MM-DD, and time should be of the format HH:MM:SS.

The CQL protocol enforces strict rules on values you can enter for any field, be it year, month, day, hour, minute, or second. Time is specified with nanosecond precision. It isn't necessary to specify all nine digits of the nanoseconds as we saw in the preceding INSERT statements.

Let's try to insert a few records with erroneous input:

INSERT INTO "user_status_updates_by_datetime" ("username",  
"status_date", "status_time", "body")
VALUES ('alice', '2016-14-23', '14:50:45.123456', 'Alice Update 7');

The response is as follows:

INSERT INTO "user_status_updates_by_datetime" ("username",  
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-23', '14:65:45.123456', 'Alice Update 8');

The response is as follows:

We notice that both of the preceding queries failed. The first insert failed because of an invalid value for status_date, which has a wrong value for month. The second insert failed because of an invalid value for status_time, which has a wrong value for minutes. Underneath, CQL protocol converts both status_date and status_time to a long format and checks for correctness of different values within the data types, date, and time.

Now, if we want to fetch all the status updates by username alice regardless of status_date or status_time, we simply run the following query:

SELECT * FROM "user_status_updates_by_datetime";

The response is as follows:

You will notice that all the status updates for the user alice are ordered by status_date first and then by status_time for a particular status_date value. The order in which the inserts were performed doesn't matter. The results would still have been ordered by date and time.

Let's try to run some queries on clustering columns and see which queries are supported and which are not. First, we will fetch all status updates before 2016-11-20 regardless of the time:

SELECT * FROM "user_status_updates_by_datetime" 
WHERE "username" = 'alice' AND "status_date" < '2016-11-20';

The response is as follows:

You will see that the query returns three status updates, which were posted before 2016-11-20. Always make sure your query contains an equality relation on the partition key. Clustering columns support both equality and inequality relations although there are certain limitations.

Let's try to get status updates that were posted after 2016-11-20 and after noon or 12:00:00. This can be done by running the following query:

SELECT * FROM "user_status_updates_by_datetime" 
WHERE "username" = 'alice' AND "status_date" > '2016-11-20' AND
"status_time" > '12:00:00';

The response is as follows:

This throws a cryptic error saying it is an invalid query. This is one of the limitations when querying is based on multiple clustering columns. This simply means that if you are using an inequality relation on a clustering column, all the preceding clustering columns need to have an equality relation. Let's test this by fetching status updates that were posted on 2016-11-21 and after 12:00:00:

SELECT * FROM "user_status_updates_by_datetime" 
WHERE "username" = 'alice' AND "status_date" = '2016-11-21' AND
"status_time" > '12:00:00';

主站蜘蛛池模板: 宝应县| 文安县| 建湖县| 赤水市| 堆龙德庆县| 东乡| 甘肃省| 宕昌县| 志丹县| 临漳县| 石嘴山市| 咸丰县| 肥西县| 吉木萨尔县| 湘西| 波密县| 枝江市| 循化| 凉山| 四会市| 泗洪县| 五河县| 博罗县| 昌江| 通榆县| 鹤山市| 无极县| 肥东县| 惠东县| 泸定县| 沙湾县| 内丘县| 洛宁县| 那曲县| 金湖县| 临沂市| 新营市| 万全县| 鸡西市| 雷州市| 铜鼓县|