官术网_书友最值得收藏!

Storage Requirements

It's crucial to keep track of the requirements of your solution in all phases of the project. Since most projects follow the agile methodology, it's not an option to just define the requirements at the start of the project and then "get to work."

The agile methodology requires team members to continuously reflect on the initial plan and requirements provided in the Deming cycle, as shown in the following figure:

Figure 2.1: The Deming cycle

A list of requirements can be pided into functional and non-functional requirements. The functional requirements contain the user stories that explain how to interact with the system; these are not in the scope of this book since they are less technical and more concerned with UX design and customer journeys. The non-functional (or technical) requirements contain descriptions of the required workings of the system. The non-functional architecture requirements for an AI storage solution describe the technical aspects and have an impact on technology choices and their way of working. The major requirements of an AI system are as follows:

Figure 2.2: Requirements for AI systems

Since this a very extensive list and some requirements are more important for a certain architectural layer than others, we will list the most important requirements per architecture layer. Before we start with that deep pe, we'll give a brief overview of the architecture of an AI system or data lake.

Throughout this chapter, we'll provide you with an example of a use case that helps translate the abstract concepts in the requirements for data storage to real-world, hands-on content. Although the sample is fictional, it's built on some common projects that we came across in real life. Therefore, the situation, target architecture, and requirements are quite realistic for an AI project.

A bank in the UK (let's say it's called PacktBank) wanted to upgrade its data storage systems to create a better environment for data scientists for AI-related projects. Currently, the data is spread out in various source systems, ranging from an old ERP system to on-premise Oracle databases, to a SaaS solution in the cloud. The new data environment (data lake) must be secure, accessible, high-performing, scalable, and easy to use. The target infrastructure is Amazon Web Services (AWS), but in the future, the company might switch to other cloud vendors or take a multi-cloud strategy; therefore, the software components should be vendor-agnostic if possible.

The Three Stages of Digital Data

It's important to realize that data storage comes in three stages:

  • At rest: Data that is stored on a disk or in memory for long-term storage; for example, data on a hard disk or data in a database.
  • In motion: Data that is transferred across a network from one system to another. Sometimes, this is also called in transit; for example, HTTP traffic on the internet, or data that comes from a database and is "on its way" to an application.
  • In use: Data that is loaded in the RAM of an application for short-term usage. This data is only available in the context of the software that is loaded. It can be seen as a cache that is temporarily needed by the software that performs tasks on the data. The data is usually a copy of data at rest; for example, a piece of customer information (let's say, a changed home address) that has been pushed from a website to the server where an API processes the update.

These stages are important to keep in mind when reasoning about technology, security, scalability, and so on. We'll bring them up in this book in several places, so make sure that you understand the differences.

主站蜘蛛池模板: 石门县| 涪陵区| 江都市| 石家庄市| 大悟县| 彰化县| 开鲁县| 郧西县| 福贡县| 玉门市| 娄烦县| 宽甸| 娱乐| 斗六市| 宿迁市| 赣州市| 吉林市| 明水县| 常宁市| 孟州市| 阳信县| 淮滨县| 武平县| 巴林左旗| 治县。| 泰顺县| 同仁县| 醴陵市| 饶河县| 辛集市| 博湖县| 揭东县| 托克托县| 旅游| 简阳市| 库车县| 台东县| 元谋县| 齐齐哈尔市| 荥经县| 大石桥市|