官术网_书友最值得收藏!

The importance of SREs 

An SRE is responsible for ensuring the systems availability, performance-monitoring, and incident response of the cloud IT platforms and services. SREs must make sure that all software applications entering production environments fully comply with a set of important requirements, such as diagrams, network topology illustrations, service dependency details, monitoring and logging plans, backups, and so on. A software application may fully comply with all of the functional requirements, but there are other sources for disruption and interruption. There may be hardware degradation, networking problems, high usage of resources, or slow responses from applications, and services could happen at any time. SREs always need to be extremely sensitive and responsive. The SREs effectiveness may be measured as a function of mean time to recover (MTTR) and mean time to failure (MTTF). In other words, the availability of system functions in the midst of failures and faults has to be guaranteed. Similarly, when the system load varies sharply, the system has to have the inherent potential to do scale up and out.

Software developers typically develop the business functionality of the application and do the necessary unit tests for the functionality they created from scratch or composed out of different, distributed, and decentralized services. But they don't always focus on creating and incorporating the code for achieving scalability, availability, reliability, and so on. System administrators, on the other hand, do everything to design, build, and maintain an organization's IT infrastructure (computing, storage, networking, and security). System administrators do try to achieve these QoS attributes through infrastructure sizing and by provisioning additional infrastructural modules (bare metal (BM) servers, virtual machines (VM) servers, and containers) to authoritatively tackle any sudden rush of users and bigger payloads. As described previously, the central goal of DevOps is to build a healthy and working relationship between the operations and the development teams. Any gaps and other friction between developers and operators ought to be identified and eliminated at the earliest by SREs so as to run any application on any machine or cluster without many twists and tweaks. The most critical challenges are how to ensure NFRs/QoS attributes.

SREs solve a very basic yet important problem that administrators and DevOps professionals do not. The infrastructures resiliency and elasticity to safeguard application scalability and reliability has to be ensured. The business continuity and productivity through minute monitoring of business applications and IT services along with other delights for customers, has to be guaranteed.  The meeting of the identified NFRs through infrastructure optimization alone is neither viable nor sustainable. NFRs have to be rather realized by skillfully etching in all the relevant code snippets and segments in the application source code itself. In short, the source code for any application has to be made aware of and is capable of easily absorbing the capacity and capability of the underlying infrastructure. That is, we are destined toward the era of infrastructure-aware applications, and, on the other side, we are heading toward application-aware infrastructures.

This is where SREs pitch in. These specially empowered professionals, with all the education, experience, and expertise, are to assist both developers and system administrators to develop, deploy, and deliver highly reliable software systems via software-defined cloud environments. SREs spend half of their time with developers and the other half with operation team to ensure much-needed reliability. SREs set clear and mathematically modeled service-level agreements (SLAs) that set thresholds for the stability and reliability of software applications.

SREs have many skills:

  • They have a deep knowledge of complex software systems
  • They are experts in data structures
  • They are excellent at designing and analyzing computer algorithms
  • They have a broad understanding of emerging technologies, tools, and techniques
  • They are passionate when it comes to coding, debugging, and problem-solving
  • They have strong analytical skills and intuition
  • They learn quickly from mistakes and eliminate them in the subsequent assignments
  • They are team players, willing to share the knowledge they have gained and gathered
  • They like the adrenaline rush of fast-paced work
  • They are good at reading technical books, blogs, and publications
  • They produce and publish technology papers, patents, and best practices

Furthermore, SREs learn and position themselves to be a single point of contact (SPOC) in the following areas:

  • They have a good understanding of code design, analysis, debugging, and optimization.
  • They have a wide understanding about various IT systems, ranging from applications to appliances (servers, storage, network components (switches, routers, firewalls, load balancers, intrusion detection and prevention systems, and so on)).
  • They are competent in emerging technologies:
    • Software-defined clouds for highly optimized and organized IT infrastructures
    • Data analytics for extracting actionable insights in time.
    • IoT for people-centric application design and delivery
    • Containerization-sponsored DevOps
    • FaaS for simplified IT operations
    • Enterprise mobility
    • Blockchain for IoT data and device security
    • AI (machine and deep-learning algorithms) for predictive and prescriptive insights
    • Cognitive computing for realizing smarter applications
    • Digital twin for performance increment, failure detection, product productivity, and resilient infrastructures
  • Conversant with a variety of automated tools
  • Familiar with reliability engineering concept
  • Well-versed with the key terms and buzzwords such as scalability, availability, maneuverability, extensibility, and dependability
  • Good at IT systems operations, application performance management, cyber security attacks and solution approaches
  • Insights-driven IT operations, administration, maintenance, and enhancement
主站蜘蛛池模板: 兰溪市| 连州市| 夏津县| 洱源县| 稻城县| 昌邑市| 天气| 棋牌| 屏南县| 陇南市| 涞源县| 政和县| 黎平县| 兴隆县| 衢州市| 丰镇市| 内江市| 左云县| 寻乌县| 海丰县| 错那县| 牟定县| 华宁县| 武定县| 莎车县| 班玛县| 尚志市| 武城县| 开远市| 读书| 南丰县| 云阳县| 广饶县| 汤阴县| 五台县| 静海县| 贺兰县| 泗洪县| 神木县| 伊川县| 饶阳县|