官术网_书友最值得收藏!

  • Microservices with Azure
  • Namit Tanasseri Rahul Rai
  • 727字
  • 2021-07-02 22:18:29

Architecture of cluster resource manager

The process of resource management is complex. It is based on multiple parameters which are highly dynamic in nature. To perform resource balancing, the resource manager has to know about all the active services and the resources consumed by each of the services at this point of time. It should also be aware of the actual capacity of every node in the cluster in terms of memory, CPU, and disk space, and the aggregate amount of resources available in the cluster. The resources consumed by a service can change over time, depending on the load it is handling. This also needs to be accounted for before making a decision to move a service from one node to another. To add to the complexity, the cluster resources are not static. The number of nodes in the cluster can increase or decrease at any point of time, which can lead to a change in load distribution. Scheduled or unscheduled upgrades can also roll through the cluster, causing temporal outages of nodes and services. Also, the very fact of cloud resources running on commodity hardware forces the resource manager to be highly fault tolerant.

To achieve these tasks, the Service Fabric cluster resource manager uses two components. The first component is an agent which is installed on every node of a cluster. The agent is responsible for collecting information from the hosting node and relaying it to a centralized service. This information will include CPU utilization, memory utilization, remaining disk space, and so on. The agent is also responsible for heartbeat checks for the node.

The second component is a service. Service Fabric is a collection of services. The cluster resource manager service is responsible for aggregating all of the information supplied by the agent and other management services and reacting to changes based on the desired state configuration of the cluster and service. The fault tolerance of the service manager is achieved via replication, similar to how it is done for the services hosted on Service Fabric. The resource manager service runs seven replicas to ensure high availability.

To understand the process of aggregation, let's take an example.

The following figure illustrates a Service Fabric cluster with six nodes. There are seven services deployed on this cluster with the names ABCDE, and F. The diagram illustrates the initial distribution of the services on the cluster based on placement rules configured for the services. Services A, B, and C are placed on node 5 (N5), service D on node 6 (N6), service G on node 2 (N2), service F on node 3 (N3) and service E on node 4 (N4). The resource manager service itself is hosted on node 1 (N1). Every node has a Service Fabric agent running which communicates with the resource manager service hosted on N1:

General resource manager functions

During runtime, if the amount of resources consumed by services changes, or if a service fails, or if a new node joins or leaves the cluster, all the changes on a specific node are aggregated and periodically sent to the central resource manager service. This is indicated by lines 1 and 2. Once aggregated, the results are analyzed before they are persisted by the resource manager service. Periodically, a process within the cluster resource manager service, looks at all of the changes, and determines whether there are any corrective actions required. This process is indicated by the step 3 in the preceding figure.

To understand step 3 in detail, let's consider a scenario where the cluster resource manager determines that N5 is overloaded. The following diagram illustrates a rebalancing process governed by the resource manager. This case is reported by the agent installed on N5. The resource manager service then checks available resources in other nodes of the cluster. Let's assume that N4 is underutilized as reported by the agent installed on N4. The resource manager coordinates with other subsystems to move a service, which is service B in this instance, to N4. This is indicated by step 5 in the following diagram:

The Resource Manager reconfigures the clusters

This whole process is automated and its complexity is abstracted from the end user. This level of automation is what makes hyperscale deployments possible on Service Fabric.

主站蜘蛛池模板: 西峡县| 隆子县| 延安市| 波密县| 永康市| 鄂托克前旗| 青州市| 太和县| 崇礼县| 图们市| 宜宾县| 宁城县| 云安县| 巍山| 分宜县| 疏附县| 江都市| 安化县| 永寿县| 时尚| 金门县| 岳阳市| 泾川县| 嵊泗县| 紫金县| 中西区| 拜泉县| 金溪县| 通州区| 玉龙| 青铜峡市| 托克逊县| 台东市| 台中市| 崇州市| 晋江市| 金华市| 建昌县| 屯门区| 三门峡市| 安义县|