[学习笔记] - Systems Performance

Systems Performance

Systems Performance studies the performance of an entire computer system

Goal: improve the end-user experience by reducing
latency and to reduce computing cost.

latency
cost
- eliminating inefficiencies
- improving system throughput
- general tuning

Role

system administrators
site reliability engineers
application developers
network engineers
database administrators
web administrators
other support staff

role should focuse on that role’s area of responsibility

Activities

development
- Setting performance objectives
- Performance characterization of POC
- dev Performance analysis
- Non-regression testing
- Benchmarking
prod
- prod proof-of-concept testing
- prod performance tuning
- prod monitoring
- prod performance analysis
- prod incident reviews
- enhance production analysis

Perspectives

There are two common perspectives for performance analysis

Resource Analysis

Resource analysis begins with analysis of the system resources: CPUs, memory, disks, network interfaces, buses, and interconnects. I

Performance issue investigations
Capacity planning

like SRE, platform level

IOPS
Throughput
Utilization
Saturation

Workload Analysis

Workload analysis examines the performance of applications: the workload
applied and how the application is responding

like engineer, application level

Requests: The workload applied
Latency: The response time of the application
Completion: Looking for errors

Why Performan different

Subjectivity

Develop a software can be objective, but performance can be subjective, since there always something to improve. and is the value enough?

the latency is 100ms

Is hard to decide 100ms is good or bad. may depend on the performance expectations of the application developers and end users.

Subjective performance can be made objective by defining clear goals

Trade Offs

pick two

good/fast/cheap
high-performance/on-time/inexpensive

When to Stop

When you’ve explained the bulk of the performance problem
When the potential ROI is less than the cost of analysis
When there are bigger ROIs elsewhere

Complexity

performance is complexity of systems
- in cloud computing, you may not even know which instance to look at first
cascading failure
- one failed component causes performance issues in others
bottlenecks
complex characteristic of the production workload
multiple causes
multiple performance issues

Observability

understanding a system through observation

use tools to understand it:

counters
- operation counts
metrics
- a statistic that has been selected to evaluate or monitor a target
profiling
- use of tools that perform sampling
tracing
- event-based recording
…

Methodologies

Terminology

IOPS
Throughput
Response time
Latency
Utilization
- describe device usage
- Time-Based
- Capacity-Based
Saturation
- which more work is requested of a resource than it can process
Bottleneck
Workload
Cache
- Cold
- Warm
- Hot
- Warmth

Concepts

Latency

Latency is a measure of time spent waiting

Latency is a metric

latency can allow maximum speedup to be estimated: disk reads are causing the query to run up to 5x more slowly.

Time Scales

it helps to have an instinct about time, and reasonable expectations for latency from different sourses.

Tuning Efforts

tuning can eliminate the setting value

Layer	Example Tuning Targets
Application	Application logic, request queue sizes, database queries performed
Database	Database table layout, indexes, buffering
System calls	Memory-mapped or read/write, sync or async I/O flags
File system	Record size, cache size, file system tunables, journaling
Storage	RAID level, number and type of disks, storage tunables

Level of Appropriatence

Different organizations and environments have different requirements for performance.

Load vs Architecture

An application can perform badly due to an issue with the software configuration and hardware on which it is running. However, an application can also
perform badly simply due to too much load being applied, resulting in queueing and long latencies.

Salability

The performance of the system under increasing load is its scalability.

The degradation of performance for nonlinear scalability, in terms of average response time or
latency.

Metrics

Performance metrics are selected statistics generated by the system.

Throughput
IOPS
Utilization
Latency

Utilization

Time-based utilization is formally defined in queueing theory
- $U=B/T$ : U = utilization, B = total time the system was busy during T
Capacity-Based
- 100% busy does not mean 100% capacity
- 100% utilization cannot accept any more work.

Saturation

The degree to which more work is requested of a resource than it can process is saturation. Saturation begins to occur at 100% utilization (capacity-based), as extra work cannot be processed and begins to queue.

Poriling

Profiling builds a picture of a target that can be studied and understood. profiling is typically performed by sampling the state of the system at timed intervals and then studying the set of samples.

CPUs are a common profiling target

Known Unknowns

Known-knowns
Known-unknowns
Unknown-unknowns

Performance is a field where “the more you know, the more you don’t know.”

Methodology

Methodology	Type
Streetlight anti-method	Observational analysis
Random change anti-method	Experimental analysis
Blame-someone-else anti-method	Hypothetical analysis
Ad hoc checklist method	Observational and experimental analysis
Problem statement	Information gathering
Scientific method	Observational analysis
Diagnosis cycle	Analysis life cycle
Tools method	Observational analysis
USE method	Observational analysis
RED method	Observational analysis
Workload characterization	Observational analysis, capacity planning
Drill-down analysis	Observational analysis
Latency analysis	Observational analysis
Method R	Observational analysis
Event tracing	Observational analysis
Baseline statistics	Observational analysis
Static performance tuning	Observational analysis, capacity planning
Cache tuning	Observational analysis, tuning
Micro-benchmarking	Experimental analysis
Performance mantras	Tuning
Queueing theory	Statistical analysis, capacity planning
Capacity planning	Capacity planning, tuning
Quantifying performance gains	Statistical analysis
Performance monitoring	Observational analysis, capacity planning

USE Method

The utilization, saturation, and errors (USE) method should be used early in a performance
investigation to identify systemic bottlenecks.

Resources: All physical server functional components (CPUs, buses, . . .). Some software
resources can also be examined, provided that the metrics make sense
Utilization: For a set time interval, the percentage of time that the resource was busy
servicing work. While busy, the resource may still be able to accept more work; the degree
to which it cannot do so is identified by saturation.
Saturation: The degree to which the resource has extra work that it can’t service, often
waiting on a queue. Another term for this is pressure.
Errors: The count of error events

Resource List
- CPUs
- Main memory
- Network interfaces
- Storage devices
- Accelerators
- Controllers
- Interconnects
Software Resource
- Mutex lock
- Thread pools
- Process/thread capacity
- File descriptor capacity

RED Method

Typically used in cloud services in a microservice architecture.

For every service, check the request rate, errors, and duration

Request rate: The number of service requests per second
Errors: The number of requests that failed
Duration: The time for requests to complete

draw a diagram of your microservice architecture and ensure that these three
metrics are monitored for each servic

Latency Analysis

breaks it into smaller components, continuing to subdivide the components with the highest latency so that the root cause can be identified and quantified. S

Example: analysis of MySQL query latency

Is there a query latency issue?
Is the query time largely spent on-CPU or waiting off-CPU?
What is the off-CPU time spent waiting for? (file system I/O)
Is the file system I/O time due to disk I/O or lock contention? (disk I/O)
Is the disk I/O time mostly spent queueing or servicing the I/O? (servicing)
Is the disk service time mostly I/O initialization or data transfer? (data transfer)

Other Method

Streetlight Abti-Method
Random Change Abti-Method
Blame Someone Else Abti-Method
Ada Hoc Checklist Method
Scientific Method
Tools Method
Method R

Models

System Under Test

be aware that perturbations (interference) can affect results

Queueing System

Visual identification

Applications

Applications Basic

Function
Operation
Performance requirements
CPU mode
Configuration
Host
Metrics
Logs
Version
Bugs
Source code
Community

Objectives

Latency
Throughput
Resource utilization
Price

Applications Performance

Selecting an I/O size
Caching
Buffering
Polling
Concurrency & Parallelism
Non-Blocking I/O
Processor Binding
Performance Mantras
1. Don’t do it.
2. Do it, but don’t do it again.
3. Do it less.
4. Do it later.
5. Do it when they’re not looking.
6. Do it concurrently.
7. Do it cheaper

Refer Book if need

OS
CPU
Memery
File System
Disks
Network
Benchmarking
Ftrace
BPF

Reference

Systems.Performance.Enterprise.and.the.Cloud.2nd.Edition.2020.12