官术网_书友最值得收藏!

Multiple clustering columns

Clustering columns are not limited to one field as specified before. Let's take a look at how multiple clustering columns work and facilitate data ordering. To illustrate this, we will recreate our status updates table so that it is clustered by the date and time when the user updated their status:

CREATE TABLE "user_status_updates_by_datetime" ( 
"username" text,
"status_date" date,
"status_time" time,
"body" text,
PRIMARY KEY ("username", "status_date", "status_time")
);

We have created a new table user_status_updates_by_datetime as shown next:

  • Partition key: username, which is a text field.
  • Clustering columns: status_date and status_time. Rows for a particular username are clustered by both columns, first by status_date and then by status_time.

Since our primary goal is to figure out how a primary key with multiple clustering columns behaves, we will be inserting updates for a single user, alice.

Let's insert a few records as follows:

INSERT INTO "user_status_updates_by_datetime" ("username",  
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-18', '08:30:55.123', 'Alice Update 1');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-18', '14:40:25.123456789', 'Alice Update 2');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-19', '08:25:25', 'Alice Update 3');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-21', '08:35:55.123456', 'Alice Update 4');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-21', '14:30:15.123', 'Alice Update 5');

INSERT INTO "user_status_updates_by_datetime" ("username",
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-23', '14:50:45.123456', 'Alice Update 6');

You will notice that status_date and status_time, which are date and time data types respectively, are inserted as strings similar to text data type.

As mentioned earlier, date should be of the format YYYY-MM-DD, and time should be of the format HH:MM:SS.

The CQL protocol enforces strict rules on values you can enter for any field, be it year, month, day, hour, minute, or second. Time is specified with nanosecond precision. It isn't necessary to specify all nine digits of the nanoseconds as we saw in the preceding INSERT statements.

Let's try to insert a few records with erroneous input:

INSERT INTO "user_status_updates_by_datetime" ("username",  
"status_date", "status_time", "body")
VALUES ('alice', '2016-14-23', '14:50:45.123456', 'Alice Update 7');

The response is as follows:

INSERT INTO "user_status_updates_by_datetime" ("username",  
"status_date", "status_time", "body")
VALUES ('alice', '2016-11-23', '14:65:45.123456', 'Alice Update 8');

The response is as follows:

We notice that both of the preceding queries failed. The first insert failed because of an invalid value for status_date, which has a wrong value for month. The second insert failed because of an invalid value for status_time, which has a wrong value for minutes. Underneath, CQL protocol converts both status_date and status_time to a long format and checks for correctness of different values within the data types, date, and time.

Now, if we want to fetch all the status updates by username alice regardless of status_date or status_time, we simply run the following query:

SELECT * FROM "user_status_updates_by_datetime";

The response is as follows:

You will notice that all the status updates for the user alice are ordered by status_date first and then by status_time for a particular status_date value. The order in which the inserts were performed doesn't matter. The results would still have been ordered by date and time.

Let's try to run some queries on clustering columns and see which queries are supported and which are not. First, we will fetch all status updates before 2016-11-20 regardless of the time:

SELECT * FROM "user_status_updates_by_datetime" 
WHERE "username" = 'alice' AND "status_date" < '2016-11-20';

The response is as follows:

You will see that the query returns three status updates, which were posted before 2016-11-20. Always make sure your query contains an equality relation on the partition key. Clustering columns support both equality and inequality relations although there are certain limitations.

Let's try to get status updates that were posted after 2016-11-20 and after noon or 12:00:00. This can be done by running the following query:

SELECT * FROM "user_status_updates_by_datetime" 
WHERE "username" = 'alice' AND "status_date" > '2016-11-20' AND
"status_time" > '12:00:00';

The response is as follows:

This throws a cryptic error saying it is an invalid query. This is one of the limitations when querying is based on multiple clustering columns. This simply means that if you are using an inequality relation on a clustering column, all the preceding clustering columns need to have an equality relation. Let's test this by fetching status updates that were posted on 2016-11-21 and after 12:00:00:

SELECT * FROM "user_status_updates_by_datetime" 
WHERE "username" = 'alice' AND "status_date" = '2016-11-21' AND
"status_time" > '12:00:00';

主站蜘蛛池模板: 新和县| 疏附县| 南安市| 兖州市| 大庆市| 赤水市| 普兰店市| 丹巴县| 铁力市| 宣恩县| 比如县| 巴楚县| 深水埗区| 绩溪县| 常德市| 新津县| 临潭县| 随州市| 甘泉县| 滦平县| 巴林右旗| 安义县| 永新县| 平远县| 南漳县| 绥化市| 扎鲁特旗| 大城县| 马公市| 鹤庆县| 类乌齐县| 息烽县| 唐海县| 久治县| 洛宁县| 枣庄市| 龙胜| 卫辉市| 调兵山市| 瑞丽市| 宝清县|