書名： Learning PostgreSQL 11
作者名： Salahaldin Juba Andrey Volkov
本章字?jǐn)?shù)： 1498字
更新時(shí)間： 2021-07-02 13:11:51

The car web portal database

At this stage, you can convert the business model of the car web portal presented in Chapter 1, Relational Databases, into a logical model. To help the developer to create a table, follow this minimal checklist:

What is the primary key?
What is the default value for each column?
What is the type of each column?
What are the constraints on each column or set of columns?
Are permissions set correctly on tables, sequences, and schemas?
Are foreign keys specified with the proper actions?
What is the data life cycle?
What are the operations allowed on the data?

To create the car web portal schema, the formal relational model won't be applied strictly. Also, surrogate keys will be used instead of natural keys. The advantages of using surrogate keys are as follows:

Natural keys can change; you can change the current email address to another one. On the one hand, you could argue that if a natural key can change, what has been identified is, in fact, not an actual natural key. On the other hand, what is the natural key to identify a person? Using a surrogate key guarantees that if a row is referenced by another row, this reference isn't lost, because the surrogate key hasn't changed.
Consider some incorrect assumptions about natural keys. Let's take an email address as an example. The general assumption about an email address is that it identifies a person uniquely. This isn't true; some email service providers set policies, such as email expiration based on activity. Private companies might have general email addresses, such as contact@... or support@.... This is also applicable to phone numbers.
Surrogate keys can be used to support a temporal database design within the relational database world. For example, some companies have very strict security requirements, and data should be versioned for each operation.
Surrogate keys often use compact data types such as integers. This enables better performance than composite natural keys.
Surrogate keys can be used in PostgreSQL to eliminate the effects of cross-column statistic limitation. PostgreSQL collects statistics per single column by default. In some cases, this isn't convenient because columns might be correlated. In this case, PostgreSQL gives a wrong estimation to the planner, and thus, imperfect execution plans are generated. To overcome this limitation, the developer can educate PostgreSQL about cross-column statistics.
Surrogate keys are better supported than natural keys by object-relational mappers, such as Hibernate.

Despite all the advantages of surrogate keys, they also have a few disadvantages:

A surrogate key is auto-generated, and the generation of the value might give different results. For example, you insert data in a test database and a staging database, and after comparing the two, you see the data isn't identical.
A surrogate key isn't descriptive. From a communication point of view, it's easier to refer to a person by a name instead of an auto-generated number.
Surrogate keys can lead to data redundancy and can generate duplicate tuples if not handled carefully.

In the web car portal ER diagram, there's an entity with the name user. Since user is a reserved keyword, the name account will be used to create the table. Note that to create a database object using a PostgreSQL keyword, the name should be quoted. The following example shows how to create a user table:

postgres=# \set VERBOSITY 'verbose'
postgres=# CREATE TABLE user AS SELECT 1;
ERROR: 42601: syntax error at or near "user"
LINE 1: CREATE TABLE user AS SELECT 1;
                     ^
LOCATION: scanner_yyerror, scan.l:1086
postgres=# CREATE TABLE "user" AS SELECT 1;
SELECT 1

In the preceding example, the VERBOSITY setting for psql can be used to show error codes. Error codes are useful in detecting errors and trapping exceptions.

\set is a psql meta-command that's used to control the psql command-line tool's behavior; SET is a PostgreSQL SQL command to change a runtime parameter.

To create an account table, execute the following command:

CREATE TABLE account (
  account_id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  first_name TEXT NOT NULL,
  last_name TEXT NOT NULL,
  email TEXT NOT NULL UNIQUE,
  password TEXT NOT NULL,
  CHECK(first_name !~ '\s' AND last_name !~ '\s'),
  CHECK (email ~* '^\w+@\w+[.]\w+$'),
  CHECK (char_length(password)>=8)
);

The following is a summary of the user table:

account_id is defined as the primary key, and identity is used to auto-generate default values. account_id is naturally unique and not null.
The first_name, last_name, email, and password attributes aren't allowed to have null values.
password should be at least eight characters in length. In reality, the password length is handled in business logic, since passwords shouldn't be stored in a plain text format in the database. For more information about securing data, have a look at Chapter 11, PostgreSQL Security.
email should match a certain regex expression. Note that the email regular expression is really simplistic.

Behind the scenes, the following objects are created:

A sequence to emulate the identity columns.
Two indices, both unique. The first one is used to validate the primary key, account_id. The second is used to validate the email address.

To create seller_account, execute the following statement:

CREATE TABLE seller_account (
  seller_account_id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  account_id INT NOT NULL REFERENCES account(account_id),
  total_rank FLOAT,
  number_of_advertisement INT,
  street_name TEXT NOT NULL,
  street_number TEXT NOT NULL,
  zip_code TEXT NOT NULL,
  city TEXT NOT NULL
);

As we can see, the seller account has a one-to-one relationship with the account. This is enforced by account_id, which consists of NOT NULL and UNIQUE constraints. Also, in this case, you can model the seller account as follows, by marking account_id as the PRIMARY KEY:

CREATE TABLE seller_account (
  account_id INT PRIMARY KEY REFERENCES account(account_id)
  ...
);

The first design is more flexible and less ambiguous. First of all, the requirements might change, and the user account and the seller account relation might change from one-to-one to one-to-many. For example, the user concept might be generalized to handle companies where the company has several seller accounts.

To model the car table, we need to create a car model. This is quite important because it will help the application user, mainly the seller, to pick up a model instead of entering the model information. In addition to that, the seller might not have all the information about the car model. Finally, allowing the seller to enter the model information might lead to data inconsistency due to human error. In real applications, lookup information, such as currency, countries, and car models, can be downloaded from certain providers. For example, country information is provided by ISO: https://www.iso.org/iso-3166-country-codes.html.

To model a one-to-many relationship, let's have a look at the relation between the advertisement and seller account. The seller can put several advertisements online. Assuming the car table already exists, this can be modeled as follows:

CREATE TABLE advertisement(
 advertisement_id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
 advertisement_date TIMESTAMP WITH TIME ZONE NOT NULL,
 car_id INT NOT NULL REFERENCES car(car_id),
 seller_account_id INT NOT NULL REFERENCES seller_account (seller_account_id)
);

seller_account_id has a NOT NULL constraint, that means the advertisement has a full-participation relationship with a seller account. It's impossible to create an advertisement without specifying the seller account. Also, the foreign key action option is not specified. The default option, NO ACTION, will be applied implicitly. If you try to delete an account, an error will be raised if it's referenced in the advertisement table.

An example of many-to-many relationships is shown by the favorite_advertisement table, where the user marks a certain advertisement as a favorite. Also, it can be seen in advertisement_rating, where the user rates a certain advertisement.

The favorite_advertisement table is modeled as follows:

CREATE TABLE favorite_advertisement(
  PRIMARY KEY (account_id,advertisement_id),
  account_id INT NOT NULL REFERENCES account(account_id),
  advertisement_id INT NOT NULL REFERENCES advertisement(advertisement_id)
);

The favorite_advertisement table associates advertisements and users; the primary key is defined as a composite key of the foreign keys.

The car portal schema shows common patterns, such as one-to-one, one-to-many and many-to-many. In cases of ambiguous business requirements or specifications, you could work around this by using rich datatypes and then iterate and refactor the model after getting more input.

The data modeling explained in this chapter is suitable for online transaction processing (OLTP); data consistency has been handled implicitly. To get a better picture, it's advisable to have a look at database normalization.

Analytical systems and OLAP applications have different data-modeling techniques due to data size and data-access patterns. For more information, have a look at Chapter 8, OLAP and Data Warehousing.

A database is often the base tier in an application; it's important to take special care of the database model. Bad designs in the data tier affect the business and presentation tiers. Also, the bad design might affect developer productivity because developers can't use technologies such as object-relational mappers (ORMs) directly.

官术网_书友最值得收藏!

Learning PostgreSQL 11

The car web portal database