官术网_书友最值得收藏!

Granulate nodes

The typical graph modeling pattern that we will discuss in this section will be called the granulate pattern. This means that in graph database modeling, we will tend to have much more fine-grained data models with a higher level of granularity than we would be used to having in a relational model.

In a relational model, we use a process called database normalization to come up with the granularity of our model. Wikipedia defines this process as follows:

"…the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships."

The reality of this process is that we will create smaller and smaller table structures until we reach the third normal form. This is a convention that the IT industry seems to have agreed on: a database is considered to have been normalized as soon as it achieves the third normal form. Visit http://en.wikipedia.org/wiki/Database_normalization#Normal_forms for more details.

As we discussed before, this model can be quite expensive as it effectively introduces the need for join tables and join operations at query time. Database administrators tend to denormalize the data for this very reason, which introduces data-duplication--another very tricky problem to manage.

In graph database modeling, however, normalization is much cheaper for the simple reason that these infamous join operations are much easier to perform. This is why we see a clear tendency in graph models to create thin nodes and relationships, that is, nodes and relationships with few properties on them. These nodes and relationships are very granular and have been granulated.

Related to this pattern is a typical question that we ask ourselves in every modeling session--should I keep this as a property or should the property become its own node? For example, should we model the alcohol percentage of a beer as a property on a beer brand? The following diagram shows the model with the alcohol percentage as a property:


A data model with fatter nodes

The alternative would be to split the alcohol percentage off as a different kind of node.
The following diagram illustrates this:


A data model with a granulated node structure

Which one of these models is right? I would say both and neither. The real fundamental thing here is that we should be looking at our queries to determine which version is appropriate. In general, I would present the following arguments:

  • If we don't need to evaluate the alcohol percentage during the course of a graph traversal, we are probably better off keeping it as a property of the end node of the traversal. After all, we keep our model a bit simpler when doing this, and everyone appreciates simplicity.
  • If we need to evaluate the alcohol percentage of a particular (set of) beer brands during the course of our graph traversal, then splitting it off into its own node category is probably a good idea. Traversing through a node is often easier and faster than evaluating properties for each and every path.

As we will see in the next paragraph, many people actually take this approach a step further by working with in-graph indexes.

主站蜘蛛池模板: 盘锦市| 礼泉县| 巴彦县| 郁南县| 竹山县| 广宁县| 阜南县| 瑞金市| 罗城| 定陶县| 临漳县| 揭东县| 闽侯县| 轮台县| 民权县| 宜兰市| 武汉市| 武定县| 集安市| 江津市| 青龙| 广丰县| 佛山市| 固原市| 东兴市| 千阳县| 石景山区| 铜山县| 横峰县| 修水县| 汕头市| 南乐县| 纳雍县| 东乡族自治县| 怀化市| 衡南县| 泗水县| 天门市| 剑阁县| 二手房| 资阳市|