[学习笔记] - Systems Performance

Systems Performance

Systems Performance studies the performance of an entire computer system

Goal: improve the end-user experience by reducing
latency
and to reduce computing cost.

  • latency
  • cost
    • eliminating inefficiencies
    • improving system throughput
    • general tuning

Role

  • system administrators
  • site reliability engineers
  • application developers
  • network engineers
  • database administrators
  • web administrators
  • other support staff

role should focuse on that role’s area of responsibility

Activities

  • development
    • Setting performance objectives
    • Performance characterization of POC
    • dev Performance analysis
    • Non-regression testing
    • Benchmarking
  • prod
    • prod proof-of-concept testing
    • prod performance tuning
    • prod monitoring
    • prod performance analysis
    • prod incident reviews
    • enhance production analysis

Perspectives

There are two common perspectives for performance analysis

Resource Analysis

Resource analysis begins with analysis of the system resources: CPUs, memory, disks, network interfaces, buses, and interconnects. I

  • Performance issue investigations
  • Capacity planning

like SRE, platform level

  • IOPS
  • Throughput
  • Utilization
  • Saturation

Workload Analysis

Workload analysis examines the performance of applications: the workload
applied and how the application is responding

like engineer, application level

  • Requests: The workload applied
  • Latency: The response time of the application
  • Completion: Looking for errors

Why Performan different

Subjectivity

Develop a software can be objective, but performance can be subjective, since there always something to improve. and is the value enough?

the latency is 100ms

Is hard to decide 100ms is good or bad. may depend on the performance expectations of the application developers and end users.

Subjective performance can be made objective by defining clear goals

Trade Offs

pick two

  • good/fast/cheap
  • high-performance/on-time/inexpensive

When to Stop

  • When you’ve explained the bulk of the performance problem
  • When the potential ROI is less than the cost of analysis
  • When there are bigger ROIs elsewhere

Complexity

  • performance is complexity of systems
    • in cloud computing, you may not even know which instance to look at first
  • cascading failure
    • one failed component causes performance issues in others
  • bottlenecks
  • complex characteristic of the production workload
  • multiple causes
  • multiple performance issues

Observability

understanding a system through observation

use tools to understand it:

  • counters
    • operation counts
  • metrics
    • a statistic that has been selected to evaluate or monitor a target
  • profiling
    • use of tools that perform sampling
  • tracing
    • event-based recording

Methodologies

Terminology

  • IOPS
  • Throughput
  • Response time
  • Latency
  • Utilization
    • describe device usage
    • Time-Based
    • Capacity-Based
  • Saturation
    • which more work is requested of a resource than it can process
  • Bottleneck
  • Workload
  • Cache
    • Cold
    • Warm
    • Hot
    • Warmth

Concepts

Latency

Latency is a measure of time spent waiting

  • Latency is a metric

latency can allow maximum speedup to be estimated: disk reads are causing the query to run up to 5x more slowly.

Time Scales

it helps to have an instinct about time, and reasonable expectations for latency from different sourses.

Tuning Efforts

tuning can eliminate the setting value

Layer Example Tuning Targets
Application Application logic, request queue sizes, database queries performed
Database Database table layout, indexes, buffering
System calls Memory-mapped or read/write, sync or async I/O flags
File system Record size, cache size, file system tunables, journaling
Storage RAID level, number and type of disks, storage tunables

Level of Appropriatence

Different organizations and environments have different requirements for performance.

Load vs Architecture

An application can perform badly due to an issue with the software configuration and hardware on which it is running. However, an application can also
perform badly simply due to too much load being applied, resulting in queueing and long latencies.

Salability

The performance of the system under increasing load is its scalability.

The degradation of performance for nonlinear scalability, in terms of average response time or
latency.

Metrics

Performance metrics are selected statistics generated by the system.

  • Throughput
  • IOPS
  • Utilization
  • Latency

Utilization

  • Time-based utilization is formally defined in queueing theory
    • U=B/TU=B/T: U = utilization, B = total time the system was busy during T
  • Capacity-Based
    • 100% busy does not mean 100% capacity
    • 100% utilization cannot accept any more work.

Saturation

The degree to which more work is requested of a resource than it can process is saturation. Saturation begins to occur at 100% utilization (capacity-based), as extra work cannot be processed and begins to queue.

Poriling

Profiling builds a picture of a target that can be studied and understood. profiling is typically performed by sampling the state of the system at timed intervals and then studying the set of samples.

CPUs are a common profiling target

Known Unknowns

  • Known-knowns
  • Known-unknowns
  • Unknown-unknowns

Performance is a field where “the more you know, the more you don’t know.”

Methodology

Methodology Type
Streetlight anti-method Observational analysis
Random change anti-method Experimental analysis
Blame-someone-else anti-method Hypothetical analysis
Ad hoc checklist method Observational and experimental analysis
Problem statement Information gathering
Scientific method Observational analysis
Diagnosis cycle Analysis life cycle
Tools method Observational analysis
USE method Observational analysis
RED method Observational analysis
Workload characterization Observational analysis, capacity planning
Drill-down analysis Observational analysis
Latency analysis Observational analysis
Method R Observational analysis
Event tracing Observational analysis
Baseline statistics Observational analysis
Static performance tuning Observational analysis, capacity planning
Cache tuning Observational analysis, tuning
Micro-benchmarking Experimental analysis
Performance mantras Tuning
Queueing theory Statistical analysis, capacity planning
Capacity planning Capacity planning, tuning
Quantifying performance gains Statistical analysis
Performance monitoring Observational analysis, capacity planning

USE Method

The utilization, saturation, and errors (USE) method should be used early in a performance
investigation to identify systemic bottlenecks.

  • Resources: All physical server functional components (CPUs, buses, . . .). Some software
    resources can also be examined, provided that the metrics make sense
  • Utilization: For a set time interval, the percentage of time that the resource was busy
    servicing work. While busy, the resource may still be able to accept more work; the degree
    to which it cannot do so is identified by saturation.
  • Saturation: The degree to which the resource has extra work that it can’t service, often
    waiting on a queue. Another term for this is pressure.
  • Errors: The count of error events
  • Resource List

    • CPUs
    • Main memory
    • Network interfaces
    • Storage devices
    • Accelerators
    • Controllers
    • Interconnects
  • Software Resource

    • Mutex lock
    • Thread pools
    • Process/thread capacity
    • File descriptor capacity

RED Method

Typically used in cloud services in a microservice architecture.

For every service, check the request rate, errors, and duration

  • Request rate: The number of service requests per second
  • Errors: The number of requests that failed
  • Duration: The time for requests to complete

draw a diagram of your microservice architecture and ensure that these three
metrics are monitored for each servic

Latency Analysis

breaks it into smaller components, continuing to subdivide the components with the highest latency so that the root cause can be identified and quantified. S

Example: analysis of MySQL query latency

  • Is there a query latency issue?
  • Is the query time largely spent on-CPU or waiting off-CPU?
  • What is the off-CPU time spent waiting for? (file system I/O)
  • Is the file system I/O time due to disk I/O or lock contention? (disk I/O)
  • Is the disk I/O time mostly spent queueing or servicing the I/O? (servicing)
  • Is the disk service time mostly I/O initialization or data transfer? (data transfer)

Other Method

  • Streetlight Abti-Method
  • Random Change Abti-Method
  • Blame Someone Else Abti-Method
  • Ada Hoc Checklist Method
  • Scientific Method
  • Tools Method
  • Method R

Models

System Under Test

be aware that perturbations (interference) can affect results

Queueing System

Visual identification

Applications

Applications Basic

  • Function
  • Operation
  • Performance requirements
  • CPU mode
  • Configuration
  • Host
  • Metrics
  • Logs
  • Version
  • Bugs
  • Source code
  • Community

Objectives

  • Latency
  • Throughput
  • Resource utilization
  • Price

Applications Performance

  • Selecting an I/O size
  • Caching
  • Buffering
  • Polling
  • Concurrency & Parallelism
  • Non-Blocking I/O
  • Processor Binding
  • Performance Mantras
    1. Don’t do it.
    2. Do it, but don’t do it again.
    3. Do it less.
    4. Do it later.
    5. Do it when they’re not looking.
    6. Do it concurrently.
    7. Do it cheaper

Refer Book if need

  • OS
  • CPU
  • Memery
  • File System
  • Disks
  • Network
  • Benchmarking
  • Ftrace
  • BPF

Reference

  • Systems.Performance.Enterprise.and.the.Cloud.2nd.Edition.2020.12