Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

YOW! 2021: Computing Performance 2021 What's on the Horizon

Keynote by Brendan Gregg for YOW! 2021.

Description: "The pursuit of faster performance in computing is the driving reason for many new technologies and updates. This talk discusses performance improvements now underway that you will likely be adopting soon, for processors (including 3D stacking and cloud vendor CPUs), memory (including DDR5 and high-bandwidth memory [HBM]), disks (including 3D Xpoint as a 3D NAND accelerator), networking (including QUIC and eXpress Data Path [XDP]), runtimes, hypervisors, and more. The future of performance is increasingly cloud-based, with hardware hypervisors and custom processors, meaningful observability of everything down to cycle stalls (even as cloud guests), and high-speed syscall-avoiding applications that use eBPF, FPGAs, and io_uring. The talk also discusses where future performance improvements might be expected, with predictions for new technologies."

next
prev
1/108
next
prev
2/108
next
prev
3/108
next
prev
4/108
next
prev
5/108
next
prev
6/108
next
prev
7/108
next
prev
8/108
next
prev
9/108
next
prev
10/108
next
prev
11/108
next
prev
12/108
next
prev
13/108
next
prev
14/108
next
prev
15/108
next
prev
16/108
next
prev
17/108
next
prev
18/108
next
prev
19/108
next
prev
20/108
next
prev
21/108
next
prev
22/108
next
prev
23/108
next
prev
24/108
next
prev
25/108
next
prev
26/108
next
prev
27/108
next
prev
28/108
next
prev
29/108
next
prev
30/108
next
prev
31/108
next
prev
32/108
next
prev
33/108
next
prev
34/108
next
prev
35/108
next
prev
36/108
next
prev
37/108
next
prev
38/108
next
prev
39/108
next
prev
40/108
next
prev
41/108
next
prev
42/108
next
prev
43/108
next
prev
44/108
next
prev
45/108
next
prev
46/108
next
prev
47/108
next
prev
48/108
next
prev
49/108
next
prev
50/108
next
prev
51/108
next
prev
52/108
next
prev
53/108
next
prev
54/108
next
prev
55/108
next
prev
56/108
next
prev
57/108
next
prev
58/108
next
prev
59/108
next
prev
60/108
next
prev
61/108
next
prev
62/108
next
prev
63/108
next
prev
64/108
next
prev
65/108
next
prev
66/108
next
prev
67/108
next
prev
68/108
next
prev
69/108
next
prev
70/108
next
prev
71/108
next
prev
72/108
next
prev
73/108
next
prev
74/108
next
prev
75/108
next
prev
76/108
next
prev
77/108
next
prev
78/108
next
prev
79/108
next
prev
80/108
next
prev
81/108
next
prev
82/108
next
prev
83/108
next
prev
84/108
next
prev
85/108
next
prev
86/108
next
prev
87/108
next
prev
88/108
next
prev
89/108
next
prev
90/108
next
prev
91/108
next
prev
92/108
next
prev
93/108
next
prev
94/108
next
prev
95/108
next
prev
96/108
next
prev
97/108
next
prev
98/108
next
prev
99/108
next
prev
100/108
next
prev
101/108
next
prev
102/108
next
prev
103/108
next
prev
104/108
next
prev
105/108
next
prev
106/108
next
prev
107/108
next
prev
108/108

PDF: YOW2021_ComputingPerformance.pdf

Keywords (from pdftotext):

slide 1:
    Computing Performance 2021
    What’s On the Horizon
    Brendan Gregg
    YOW!
    
slide 2:
    Disclaimers: About this talk
    This is
    a performance engineer's views about industry-wide server performance
    This isn't
    necessarily about my employer, or my employer’s views
    an endorsement of any company/product or sponsored by anyone
    professional market predictions (various companies sell such reports)
    based on confidential materials
    necessarily correct or fit for any purpose
    My predictions may be wrong! They will be thought-provoking.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 3:
    Agenda
    1. Processors
    2. Memory
    Slides are online and include extra details as fine print
    Slides: http://www.brendangregg.com/Slides/YOW2021_ComputingPerformance.pdf
    3. Disks
    4. Networking
    5. Runtimes
    6. Kernels
    7. Hypervisors
    8. Observability
    Not covering: Databases, file systems, front-end, mobile, desktop.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 4:
    1. Processors
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 5:
    Clock rate
    Early Intel Processors
    Processor
    Intel 8086
    Intel 386 DX
    Intel Pentium
    Pentium Pro
    Pentium III
    Intel Xeon
    GHz
    Max GHz
    Year
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 6:
    Clock rate
    Server Processor Examples (AWS EC2)
    Cores/T. Max GHz
    4/8
    8/16
    10/20
    24/48
    24/48
    Threads
    Max GHz
    Hardware Threads
    Processor
    Xeon X5550
    Xeon E5-2665 0
    Xeon E5-2680 v2
    Platinum 8175M
    Platinum 8259CL
    Max GHz
    Year
    2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
    Increase has leveled off due to power/efficiency
    Workstation processors higher; E.g., 2020 Xeon W-1270P @ 5.1 GHz
    Horizontal scaling instead
    More CPU cores, hardware threads, and server instances
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 7:
    Interconnects
    Year
    CPU Interconnect
    2007 Intel FSB
    2008 Intel QPI
    2017 Intel UPI
    Bandwidth
    Gbytes/s
    10 years:
    6x core count
    3.25x bus rate
    Source: Systems Performance 2nd Edition
    Figure 6.10 [Gregg 20]
    Memory bus (covered later) also lagging
    CPU utilization is wrong
    Often mostly memory/interconnect stalls
    90% CPU
    ...may mean:
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 8:
    Lithography
    Semiconductor Nanometer Process
    32nm
    TSMC expects volume production
    of 3nm in 2022 [Quach 21a]
    3nm
    2nm
    IBM has already built one [Quach 21b]
    Source: Semiconductor device fabrication [Wikipedia 21a]
    BTW: Silicon atom diameter ~0.2 nm [Wikipedia 21b]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    Lithography limits expected to
    be reached by 2029, switching
    to stacked CPUs. [Moore 20]
    
slide 9:
    Lithography
    “Nanometer process”
    since 2010 should be
    considered a
    marketing term
    Semiconductor Nanometer Process
    32nm
    New terms proposed include:
    GMT (gate pitch, metal pitch, tiers)
    LMC (logic, memory, interconnects)
    [Moore 20]
    TSMC expects volume production
    of 3nm in 2022 [Quach 21a]
    3nm
    2nm
    Source: Semiconductor device fabrication [Wikipedia 21a]
    BTW: Silicon atom diameter ~0.2 nm [Wikipedia 21b]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    IBM has already built one [Quach 21b]
    (it has 12nm gate length)
    Lithography limits expected to
    be reached by 2029, switching
    to stacked CPUs. [Moore 20]
    
slide 10:
    Other processor scaling
    Special instructions
    E.g., AVX-512 Vector Neural Network Instructions (VNNI)
    Connected chiplets
    Using embedded multi-die interconnect bridge (EMIB) [Alcorn 17]
    3D stacking
    E.g., Intel HBM, AMD Vcache [Cutress 21]
    Hybrid core architecture
    ARM big.LITTLE; Intel Alder Lake pcores/ecores [Alcorn 21]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 11:
    Recent server processor examples
    Vendor Processor
    Clock
    Cores/T.
    2.3 - 3.4
    40/80
    AMD
    Xeon Platinum “10nm”
    8380 (Ice Lake)
    EPYC 7713P “7nm”
    2.0 - 3.675 64/128
    Mar 2021
    ARMbased
    Ampere Altra
    Q80-33
    Dec 2020
    Intel
    Process
    “7nm”
    80/80
    LLC
    Date
    Mbytes
    Apr 2021
    Intel Alder Lake for server (Sapphire Rapids) coming soon?
    Other server processors: IBM Z, RISC-V
    Coming soon to a datacenter near you
    Although there is a TSMC chip shortage that may last through to 2022/2023 [Quatch 21][Ridley 21]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 12:
    Cloud chip race
    Amazon ARM/Graviton2
    ARM Neoverse N1, 64 core, 2.5 GHz
    Promising microbenchmark results in a test environment
    Generic processors
    x86
    AMD
    ARM
    Grav2
    MSFT
    Microsoft ARM
    ARM-based something coming soon [Warren 20]
    Google SoC
    Systems-on-Chip (SoC) coming soon [Vahdat 21]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    GOOG
    Cloud processors
    
slide 13:
    Accelerators
    GPUs
    Parallel workloads, thousands of GPU cores. Widespread adoption in machine learning.
    FPGAs
    Reprogrammable semiconductors
    Great potential, but needs specialists to program
    Good for algorithms: compression, cryptocurrency,
    video encoding, genomics, search, etc.
    Microsoft FPGA-based configurable cloud [Russinovich 17]
    Also IPUs, TPUs
    Infrastructure processing units [Kummrow 21]
    Tensor processing units [Google 21]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    CPU
    Ease of use
    The “other” CPUs you
    may not be monitoring
    GPU
    FPGA
    Performance potential
    
slide 14:
    Latest GPU examples
    NVIDIA GeForce RTX 3090: 10,496 CUDA cores, 2020
    [Burnes 20]
    Cerebras Gen2 WSE: 850,000 AI-optimized cores, 2021
    Use most of the silicon wafer for one chip.
    2.6 trillion transistors, 23 kW. [Trader 21]
    Previous version was already the “Largest chip ever built,”
    and US$2M. [insideHPC 20]
    GPU
    SM: Streaming multiprocessor
    SP: Streaming processor
    Control
    Control
    Control
    Control
    Cores (SPs)
    L2 Cache
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 15:
    Latest FPGA examples
    Xilinx Virtex UltraScale+ VU19P, 8,938,000 logic cells, 2019
    Using 35B transistors. Also has 4.5 Tbit/s transceiver bandwidth (bidir), and 1.5 Tbit/sec DDR4 bandwidth
    [Cutress 19]
    Xilinx Virtex UltraScale+ VU9P, 2,586,000 logic cells, 2016
    Deploy right now: AWS EC2 F1 instance type (up to 8 of these FPGAs per instance)
    AMD is acquiring Xilinx
    FPGA
    BPF (covered later) already in FPGAs
    E.g., 400 Gbit/s packet filter FFShark [Vega 20]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 16:
    My Predictions
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 17:
    My Prediction: Multi-socket is doomed
    Single socket is getting big enough (cores)
    Already scaling horizontally (cloud)
    One
    socket
    And in datacenters, via “blades” or “microservers”
    Cloud
    Why pay NUMA costs?
    gt;100
    cores
    Two single-socket instances should out-perform one two-socket instance
    1 hop
    Mem
    2 hops
    CPU
    CPU
    Mem
    Multi-socket future is mixed: one socket for cores, one GPU socket, one FPGA socket, etc. EMIB connected.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 18:
    My Prediction: SMT future unclear
    Simultaneous multithreading (SMT) == hardware threads
    Performance variation
    ARM cores competitive
    Post meltdown/spectre
    Some people turn them off
    Possibilities:
    SMT becomes “free”
    Processor feature, not a cost basis
    Turn “oh no! hardware threads” into
    “great! bonus hardware threads!”
    No more hardware threads
    Future investment elsewhere
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 19:
    My Prediction: Core count limits
    Imagine an 850,000-core server processor in today’s systems...
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 20:
    My Prediction: Core count limits
    Worsening problems:
    Memory-bound workloads
    Kernel/app lock contention
    False sharing
    Power consumption
    Core connectivity overheads
    etc.
    Source:
    Figure 2.16
    [Gregg 20]
    General-purpose computing will hit a practical core limit
    For a given memory subsystem & kernel, and running multiple applications
    E.g., 1024 cores (except GPUs/ML/AI); Esperanto RISC-V is already reaching “kilocore” scale [Kostovic 21]
    Apps themselves will hit an even smaller practical limit (some already have by design, e.g., Node.js and 2 CPUs)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 21:
    My Prediction: pcores & ecores
    Intel Alder Lake (desktop) has performance and efficiency cores
    This will come to server
    Server Processor
    Efficiency core tasks:
    Garbage collection
    NUMA rebalancing
    FS writeback compression & flushing
    Backups
    Security scanning
    etc.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    pcore
    pcore
    pcore
    pcore
    ecores
    
slide 22:
    My Prediction: 3 Eras of processor scaling
    Delivered processor characteristics:
    Era 1: Clock frequency
    Era 2: Core/thread count
    Era 3: Cache size & policy
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 23:
    My Prediction: 3 Eras of processor scaling
    Practical server limits:
    Era 1: Clock frequency
    → already reached by ~2005 (3.5 GHz)
    Era 2: Core/thread count
    → limited by mid 2030s (e.g., 1024)
    Era 3: Cache size & policy → limited by end of 2030s
    Mid-century will need an entirely new computer hardware architecture, kernel memory architecture, or logic gate
    technology, to progress further.
    E.g., use of graphine, carbon nanotubes [Hruska 12]
    This is after moving more to stacked processors
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 24:
    My Prediction: More processor vendors
    ARM licensed or RISC-V
    Including Apple M1 for servers
    Era of CPU choice
    Beware: “optimizing for the benchmark”
    Don’t believe microbenchmarks without doing
    “active benchmarking”: Root-cause perf analysis
    while the benchmark is still running.
    Intel back to innovating & competing
    Pat Gelsinger now CEO
    Cores
    Benchmark Optimizer Unit
    (confidential)
    LLC
    MMU
    DogeCPU “+AggressiveOpts” processor
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 25:
    My Prediction: Cloud CPU advantage
    Large cloud vendors can analyze >gt;100,000 workloads directly
    Via PMCs and other processor features.
    Vast real-world detail to aid processor design
    More detail than traditional processor vendors have, and detail available immediately whenever they want.
    Will processor vendors offer their own clouds just to get the same data?
    Machine-learning aided processor design
    Based on the vast detail. Please point it at real-world workloads and not microbenchmarks.
    Vast detail example: processor trace showing timestamped instructions:
    # perf script --insn-trace --xed
    date 31979 [003] 653971.670163672: ... (/lib/x86_64-linux-gnu/ld-2.27.so) mov %rsp, %rdi
    date 31979 [003] 653971.670163672: ... (/lib/x86_64-linux-gnu/ld-2.27.so) callq 0x7f3bfbf4dea0
    date 31979 [003] 653971.670163672: ... (/lib/x86_64-linux-gnu/ld-2.27.so) pushq %rbp
    [...]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 26:
    My Prediction: FPGA turning point
    Little adoption (outside cryptocurrency) until major app support
    Solves the ease of use issue: Developers just configure the app (which may fetch and deploy an FMI)
    BPF use cases are welcome, but still specialized/narrow
    Needs runtime support, e.g., the JVM. Already work in this area (e.g., [TornadoVM 21]).
    apt install openjdk-21
    apt install openjdk-21-libfpga
    JVM
    FPGA
    Accelerator
    JVM
    java -XX:+UseFPGA
    (none of this is real, yet)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 27:
    2. Memory
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 28:
    Many workloads memory I/O bound
    # ./pmcarch 1
    K_CYCLES
    K_INSTR
    334937819 141680781
    329721327 140928522
    330388918 141393325
    329889409 142876183
    [...]
    IPC BR_RETIRED
    0.42 25744860335
    0.43 25760806599
    0.43 25821331202
    0.43 26506966225
    BR_MISPRED
    BMR% LLCREF
    2.08 1611987169
    2.04 1504594986
    1.88 1535130691
    1.93 1501785676
    LLCMISS
    LLC%
    # ./pmcarch 1
    K_CYCLES
    K_INSTR
    [...]
    IPC BR_RETIRED
    0.66 4692322525
    0.65 5286747667
    0.70 4616980753
    0.69 5055959631
    BR_MISPRED
    BMR% LLCREF
    1.95 780435112
    1.81 751335355
    1.87 709841242
    1.83 787333902
    LLCMISS
    LLC%
    # ./pmcarch
    K_CYCLES
    K_INSTR
    122697727 13892225
    144881903 17918325
    [...]
    IPC BR_RETIRED
    0.11 2604221808
    0.12 3240599094
    0.14 2722513072
    0.15 2815805820
    BR_MISPRED
    BMR% LLCREF
    1.56 419652590
    1.48 489936685
    1.56 401658252
    1.48 386979370
    LLCMISS
    LLC%
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 29:
    DDR5 has better bandwidth
    DDR5 has a faster bus
    But not width
    Needs processor support
    E.g., Intel Alder Lake / Sapphire Rapids
    512GB DDR5 DIMMs
    Year
    Memory
    DDR-333
    DDR2-800
    DDR3-1600
    DDR4-3200
    DDR5-6400
    Peak Bandwidth
    Gbytes/s
    Already released by Samsung [Shilov 21]
    Desktop/Gamers already know about it:
    Gbytes/s
    2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 30:
    DDR latency
    Year
    Latency (ns)
    Memory
    DDR-333
    Latency (ns)
    DDR-333
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 31:
    DDR latency
    Hasn’t changed in 20 years
    This is single access latency
    Same memory clock (200 MHz) [Greenberg 11]
    Also see [Cutress 20][Goering 11]
    Year
    Low-latency DDR does exist
    Reduced Latency DRAM (RLDRAM) by Infineon
    and Micron: lower latency but lower density
    Not seeing widespread server use (I’ve seen it
    marketed towards HFT)
    Latency (ns)
    Memory
    DDR-333
    DDR2-800
    DDR3-1600
    DDR4-3200
    DDR5-6400
    Latency (ns)
    DDR-333
    DDR5-6400
    :-(
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 32:
    HBM
    High bandwidth memory, 3D stacking
    Target uses cases include high performance computing, and virtual reality graphical processing [Macri 15]
    GPUs already use it
    Can be provided on-package
    Intel Sapphire Rapids rumored to include 64 Gbyte HBM2E [Shilov 21d]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 33:
    Server DRAM size
    SuperMicro SuperServer B12SPE-CPU-25G
    Single Socket (see earlier slides)
    16 DIMM slots
    4 TB DDR-4
    Processor
    socket
    [SuperMicro 21]
    Facebook Delta Lake (1S) OCP
    6 DIMM slots
    96 Gbytes DDR-4
    Price/optimal for a typical WSS?
    DIMMs
    DIMMs
    B12SPE-CPU-25G
    [Haken 21]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 34:
    Additional memory tier
    3D XPoint (next section) memory mode:
    - Can also operate in application direct mode and storage mode [Intel 21]
    Main
    memory
    Persistent memory
    Storage devices
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    DRAM (
slide 35:
    My Prediction: Extrapolation
    Not a JEDEC announcement
    Assumes miraculous
    engineering work
    For various challenges see [Peterson 20]
    But will single-access latency
    drop in DDR-6?
    I’d guess not, DDR internals are already at
    their cost-sweet-spot, leaving low-latency
    for other memory technologies
    Year
    Memory
    DDR-333
    DDR2-800
    DDR3-1600
    DDR4-3200
    DDR5-6400
    DDR6-12800
    DDR7-25600
    DDR8-51200
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    Peak Bandwidth
    Gbytes/s
    doubling
    
slide 36:
    My Prediction: DDR5 “up to 2x” Wins
    E.g., IPC 0.1 → ~0.2 for bandwidth-bound workloads
    “good*”
    >gt;2.0
    Instruction bound
    IPC
    “bad”
    
slide 37:
    My Prediction: HBM-only servers
    Clouds offering “high bandwidth memory” HBM-only
    instances
    HBM on-processor
    Finally helping memory catch up to core scaling
    RLDRAM on-package as another option?
    “Low latency memory” instance
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 38:
    My Prediction: Extra tier too late
    Competition isn’t disks, it’s Tbytes of DRAM
    SuperMicro’s single socket should hit 8 Tbytes DDR-5
    AWS EC2 p4.24xl has 1.1 Tbytes of DRAM (deploy now!)
    How often does your working set size (WSS) not fit?
    Across several of these for redundancy?
    Next tier needs to get much bigger than DRAM (10+x)
    and much cheaper to find an extra-tier use case
    (e.g., cost based).
    Meanwhile, DRAM is still getting bigger and faster
    I developed the first cache tier between main memory
    and disks to see widespread use:
    the ZFS L2ARC [Gregg 08]
    Main
    memory
    WSS
    Persistent memory
    Storage devices
    “cold”
    data
    
slide 39:
    3. Disks
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 40:
    Recent timeline for rotational disks
    2005: Perpendicular magnetic recording (PMR)
    Writes vertically using a shaped magnetic field for higher density
    2013: Shingled magnetic recording (SMR)
    (next slide)
    2019: Multi-actuator technology (MAT)
    Two sets of heads and actuators; like 2-drive RAID 0 [Alcorn 17].
    2020: Energy-assisted magnetic recording (EAMR)
    Western Digital 18TB & 20TB [Salter 20]
    2021: Heat-assisted magnetic recording (HAMR)
    Seagate 20TB HAMR drives [Shilov 21b]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 41:
    Recent timeline for rotational disks
    2005: Perpendicular magnetic recording (PMR)
    Writes vertically using a shaped magnetic field for higher density
    2013: Shingled magnetic recording (SMR)
    (next slide)
    2019: Multi-actuator technology (MAT)
    Two sets of heads and actuators; like 2-drive RAID 0 [Alcorn 17].
    2020: Energy-assisted magnetic recording (EAMR)
    Western Digital 18TB & 20TB [Salter 20]
    2021: Heat-assisted magnetic recording (HAMR)
    Seagate 20TB HAMR drives [Shilov 21b]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    I don’t know their perf characteristics yet 41
    
slide 42:
    SMR
    11-25% more storage, worse performance
    Writes tracks in an overlapping way, like shingles on a roof. [Shimpi 13]
    Overwritten data must be rewritten. Suited for archival (write once) workloads.
    Read head
    Written tracks
    Look out for 18TB/20TB-with-SMR drive releases
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 43:
    Flash memory-based disks
    Single-Level Cell (SLC)
    Multi-Level Cell (MLC)
    Enterprise MLC (eMLC)
    2009: Tri-Level Cell (TLC)
    2009: Quad-Level Cell (QLC)
    QLC is only rated for around 1,000 block-erase cycles [Liu 20].
    2013: 3D NAND / Vertical NAND (V-NAND)
    SK Hynix envisions 600-Layer 3D NAND [Shilov 21c]. Should be multi-Tbyte.
    SSD performance pathologies: latency from aging, wear-leveling, fragmentation, internal compression, etc.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 44:
    Persistent memory-based disks
    2017: 3D XPoint (Intel/Micron) Optane
    Low and consistent latency (e.g., 14 us access latency) [Hady 18]
    App-direct mode, memory mode, and as storage
    Wordlines
    Memory cells
    Bitlines
    Cell selection
    3D XPoint
    DRAM: Trapped electrons in a capacitor, requires refreshing
    3D XPoint: Resistance change; layers of wordlines+cells+bitlines keep stacking vertically
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 45:
    Latest storage device example
    2021: Intel Optane memory H20
    QLC 3D NAND storage (512 Gbytes / 1 Tbyte) +
    3D XPoint as an accelerator (32 Gbytes)
    Currently M.2 2280 form factor (laptops)
    (Announced while I was developing these slides)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 46:
    Storage Interconnects
    SAS-4 cards in development
    (Storage attached SCSI)
    PCIe 5.0 coming soon
    (Peripheral Component Interconnect Express)
    Intel already demoed on Sapphire Rapids [Hruska 20]
    NVMe 1.4 latest
    (Non-Volatile Memory Express)
    Storage over PCIe bus
    Support zoned namespace SSDs (ZNS) [ZonedStorage 21]
    Bandwidth bounded by PCIe bus
    These have features other than speed
    Reliability, power management, virtualization support, etc.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    Year
    Specified
    202?
    Interface
    Year
    Specified
    Interface
    SAS-1
    SAS-2
    SAS-3
    SAS-4
    SAS-5
    PCIe 1
    PCIe 2
    PCIe 3
    PCIe 4
    PCIe 5
    Bandwidth
    Gbit/s
    Bandwidth 16
    lane Gbyte/s
    
slide 47:
    Linux Kyber I/O scheduler
    Multi-queue, target read & write latency
    Up to 300x lower 99th percentile latencies [Gregg 18]
    Linux 4.12 [Corbet 17]
    reads (sync)
    dispatch
    writes (async)
    Kyber (simplified)
    dispatch
    queue size adjust
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    completions
    
slide 48:
    My Prediction: Slower rotational
    Archive focus
    There’s ever-increasing demand for storage (incl. social video today; social VR tomorrow?)
    Needed for archives
    More “weird” pathologies. SMR is just the start.
    Even less tolerant to shouting
    Bigger, slower, and weirder
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 49:
    My Prediction: 3D XPoint
    As a rotational disk accelerator
    As petabyte storage
    Layers keep stacking
    3D NAND could get to petabytes too, but consumes more power
    1 Pbyte = ~700M 3.5inch floppies!
    And not really as a memory tier (DRAM too good) or widespread application direct (too much work when 3D
    XPoint storage accelerators exist so apps can get benefits without changing anything)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 50:
    My Prediction: More flash pathologies
    Worse internal lifetime
    More wear-leveling & logic
    More latency outliers
    Bigger, faster, and weirder
    We need more observability of flash drive internals
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 51:
    4. Networking
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 52:
    Latest Hardware
    400 Gbit/s in use
    E.g., 400 Gbit/s switches/routers by Cisco and Juniper, tranceivers by Arista and Intel
    AWS EC2 P4 instance type (deploy now!)
    On PCI, needs PCIe 5
    800 Gbit/s next
    [Charlene 20]
    Terabit Ethernet (1 Tbit/s) not far away
    More NIC features
    E.g., inline kTLS (TLS offload to the NIC), e.g., Mellanox ConnectX-6-Dx [Gallatin 19]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 53:
    Protocols
    QUIC / HTTP/3
    TCP-like sessions over (fast) UDP.
    0-RTT connection handshakes. For clients that have previously communicated.
    MP-TCP
    Multipath TCP. Use multiple paths in parallel to improve throughput and reliability. RFC-8684 [Ford 20]
    Linux support starting in 5.6.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 54:
    Linux TCP Congestion Control Algorithms
    DCTCP
    Data Center TCP. Linux 3.18. [Borkmann 14]
    TCP NV
    New Vegas. Linux 4.8
    TCP BBR
    Bottleneck Bandwidth and RTT (BBR) improves performance on packet loss networks [Cardwell 16]
    With 1% packet loss, Netflix sees 3x better throughput [Gregg 18]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 55:
    Linux Network Stack
    Queues/
    Tuning
    Source: Systems Performance 2nd Edition, Figure 10.8 [Gregg 20]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 56:
    Linux TCP send path
    Keeps adding performance features
    Source: Systems Performance 2nd Edition, Figure 10.11 [Gregg 20]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 57:
    Software
    eXpress Data Path (XDP) (uses eBPF)
    Programmable fast lane for networking. In the Linux kernel.
    A role previously served by DPDK and kernel bypass.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 58:
    My Prediction: BPF in FPGAs/IPUs
    Massive I/O tranceiver capabilities
    Netronome already did BPF in hardware
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 59:
    My Prediction: Cheap BPF routers
    Linux + BPF + 400 GbE NIC
    Cheap == commodity hardware
    Use case from the beginning of eBPF (PLUMgrid)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 60:
    My Prediction: More demand for network perf
    Apps increasingly network
    Netflix 4K content
    Remote work & video conferencing
    World of sensors
    VR tourism
    Facebook VR multiverse
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 61:
    5. Runtimes
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 62:
    Latest Java
    Sep 2018: Java 11 (LTS)
    JEP 333 ZGC A Scalable Low-Latency Garbage Collector
    JEP 331 Low-Overhead Heap Profiling
    GC adaptive thread scaling
    Sep 2021: Java 17 (LTS)
    JEP 338: Vector API (JDK16)
    Parallel GC improvements (JDK14)
    Various other perf improvements (JDK12-17)
    Java 11 includes JMH JDK microbenchmarks
    [Redestad 19]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 63:
    My Predictions: Runtime features
    FPGA as a compiler target
    E.g., JVM c2 or Graal adding it as a compiler target, and becoming a compiler “killer” feature.
    io_uring I/O libraries
    Massively accelerate some I/O-bound workloads by switching libraries.
    Adaptive runtime internals
    I don’t want to pick between c2 and Graal. Let the runtime do both and pick fastest methods; ditto for testing
    GC algorithms.
    Not unlike the ZFS ARC shadow-testing different cache algorithms.
    1000-core scalability support
    Runtime/library/model support to help programmers write code to scale to hundreds of cores
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 64:
    6. Kernels
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 65:
    Latest Kernels/OSes
    Apr 2021: FreeBSD 13.0
    Oct 2021: Linux 5.15 (LTS)
    Nov 2021: Windows 10.0.22000.318
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 66:
    Recent Linux perf features
    2021: Syscall user dispatch (5.11)
    2020: Static calls to improve Spectre-fix (5.10)
    2020: BPF on socket lookups (5.9)
    2020: Thermal pressure (5.7)
    2020: MultiPath TCP (5.6)
    2019: MADV_COLD, MADV_PAGEOUT (5.4)
    2019: io_uring (5.1)
    2019: UDP GRO (5.0)
    2019: Multi-queue I/O default (5.0)
    2018: TCP EDT (4.20)
    2018: PSI (4.20)
    For 2016-2018, see my summary: [Gregg 18].
    Includes CPU schedulers (thermal, topology);
    Block I/O qdiscs; Kyber scheduler (earlier slide);
    TCP congestion control algoritms (earlier slide); etc.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 67:
    Recent Linux perf features
    2021: Syscall user dispatch (5.11)
    2020: Static calls to improve Spectre-fix (5.10)
    2020: BPF on socket lookups (5.9)
    2020: Thermal pressure (5.7)
    2020: MultiPath TCP (5.6)
    2019: MADV_COLD, MADV_PAGEOUT (5.4)
    2019: io_uring (5.1)
    2019: UDP GRO (5.0)
    2019: Multi-queue I/O default (5.0)
    2018: TCP EDT (4.20)
    2018: PSI (4.20)
    For 2016-2018, see my summary: [Gregg 18].
    Includes CPU schedulers (thermal, topology);
    Block I/O qdiscs; Kyber scheduler (earlier slide);
    TCP congestion control algoritms (earlier slide); etc.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 68:
    io_uring
    Faster syscalls using
    shared ring buffers
    Send and completion ring buffers
    Allows I/O to be batched and async
    Primary use cases network and disk I/O
    Apps
    syscalls
    io_uring
    Kernel
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 69:
    eBPF Everywhere
    [Thaler 21]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    Plus eBPF for BSD projects already started.
    
slide 70:
    eBPF == BPF
    2015:
    BPF: Berkeley Packet Filter
    eBPF: extended BPF
    2021:
    “Classic BPF”: Berkeley Packet Filter
    BPF: A technology name (aka eBPF)
    Kernel engineers like to use “BPF”; companies “eBPF”.
    This is what happens when you don’t have marketing professionals help name your product
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 71:
    BPF Future: Event-based Applications
    User-mode
    Applications
    Kernel-mode
    Applications (BPF)
    U.E.
    Scheduler
    Kernel
    Kernel
    Events
    Hardware Events (incl. clock)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 72:
    https://twitter.com/srostedt/status/1177147373283418112
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 73:
    Emerging BPF uses
    Observability agents
    Security agents
    TCP congestion control algorithms
    Kernel drivers
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 74:
    My Prediction: Future BPF Uses
    File system buffering/readahead policies
    CPU scheduler policies
    Lightweight I/O-bound applications (e.g., proxies)
    Or such apps can go to io_uring or FPGAs. “Three buses arrived at once.”
    When I did engineering at University: “people ride buses and electrons ride busses.” Unfortunately that
    usage has gone out of fashion, otherwise it would have been clear which bus I was referring to!
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 75:
    My Prediction: Kernels become JITed
    PGO/AutoFDO shows ~10% wins, but hard to manage
    Performance-guided optimization (PGO) / Auto feedback-directed optimization (AutoFDO)
    Some companies already do kernel PGO (Google [Tolvanen 20], Microsoft [Bearman 20])
    We can't leave 10% on the table forever
    Kernels PGO/JIT support by default, so it “just works.”
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 76:
    My Prediction: Kernel emulation often slow
    I can run gt; apps under gt;
    by emulating gt; syscalls!
    Cool project, but:
    Missing latest kernel and perf features (E.g., Linux’s BPF, io_uring, WireGuard, etc. Plus certain syscall flags
    return ENOTSUP. So it’s like a weird old fork of Linux.)
    Some exceptions: E.g., another kernel may have better hardware support, which may benefit apps
    more than the loss of kernel capabilities.
    Debugging and security challenges. Better ROI with lightweight VMs.
    In other words, WSL2 >gt;>gt; WSL1
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 77:
    My Prediction: OS performance
    Linux: increasing complexity & worse perf defaults
    Becomes so complex that it takes an OS team to make it perform well. This assumes that the defaults rot,
    because no perf teams are running the defaults anymore to notice (e.g., high-speed network engineers
    configure XDP and QUIC, and aren’t looking at defaults with TCP). A bit more room for a lightweight kernel
    (e.g., BSD) with better perf defaults to compete. Similarities: Oracle DB vs MySQL; MULTICS vs UNIX.
    BSD: high perf for narrow uses
    Still serving some companies (including Netflix) very well thanks to tuned performance (see footnote on p124 of
    [Gregg 20]). Path to growth is better EC2/Azure performance support, but it may take years before a big
    customer (with a perf team) migrates and gets everything fixed. There are over a dozen of perf engineers
    working on Linux on EC2; BSD needs at least one full time senior EC2 (not metal) perf engineer.
    Windows: community perf improvements
    BPF tracing support allows outsiders to root cause kernel problems like never before (beyond ETW/Xperf). Will
    have a wave of finding “low hanging fruit” to begin with, improving perf and reliability.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 78:
    My Prediction: Unikernels
    Finally gets one compelling published use case
    “2x perf for X”
    But few people run X
    Needs to be really kernel heavy, and not many workloads are. And there’s already a lot of competition for
    reducing kernel overhead (BPF, io_uring, FPGAs, DPDK, etc.)
    Once one use case is found, it may form a valuable community around X and Unikernels. But it needs the
    published use case to start, preferably from a FAANG.
    Does need to be 2x or more, not 20%, to overcome the cost of retooling everything, redoing all observability
    metrics, profilers, etc. It’s not impossible, but not easy [Gregg 16].
    More OS-research-style wins found from hybrid- and micro-kernels.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 79:
    7. Hypervisors
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 80:
    Containers
    Cgroup v2 rollout
    Container scheduler adoption
    Kubernetes, OpenStack, and more
    Netflix develops its own called “Titus” [Joshi 18]
    Price/performance gains: “Tetris packing” workloads without too much interference (clever scheduler)
    Many perf tools still not “container aware”
    Usage in a container not restricted to the container, or not permitted by default (needs CAP_PERFMON
    CAP_SYS_PTRACE, CAP_SYS_ADMIN)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 81:
    Hardware Hypervisors
    Source: Systems Performance 2nd Edition, Figure 11.17 [Gregg 20]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 82:
    VM Improvements
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    Source:
    [Gregg 17]
    
slide 83:
    Lightweight VMs
    E.g., AWS “Firecracker”
    Source: Systems Performance 2nd Edition, Figure 11.4 [Gregg 20]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 84:
    My Prediction: Containers
    Perf tools take several years to be fully “container aware”
    Includes non-root BPF work.
    It’s a lot of work, and not enough engineers are working on it. We’ll use workarounds in the meantime (e.g.,
    Kyle Anderson and Sargun Dhillon have made perf tools work in containers at Netflix).
    Was the same with Solaris Zones (long slow process).
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 85:
    My Prediction: Landscape
    Short term:
    Containers everywhere
    Long term:
    More containers than VMs
    More lightweight VM cores than container cores
    Hottest workloads switch to dedicated kernels (no kernel resource sharing, no seccomp overhead, no
    overlay overhead, full perf tool access, PGO kernels, etc.)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 86:
    My Prediction: Evolution
    1.FaaS
    Light workload
    2.Container
    3.Lightweight VM
    4.Metal
    Heavy workload
    Many apps aren’t heavy
    Metal can also mean single container on metal
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 87:
    My Prediction: Cloud Computing
    Microservice IPC cost drives need for:
    Container schedulers co-locating chatty services
    With BPF-based accelerated networking between them (e.g., Cilium)
    Cloud-wide runtime schedulers co-locating apps
    Multiple apps under one JVM roof and process address space
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 88:
    8. Observability
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 89:
    2021: Age of Seeing
    Flame graphs everywhere
    Latency heat maps
    eBPF & bpftrace
    PMCs in the cloud
    More info: flame graphs [Gregg 13], heat maps [Gregg 10],
    and eBPF [Gregg 16b]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 90:
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 91:
    BPF Perf Tools
    (In red are the new open source
    tools I developed for the
    BPF book)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 92:
    Example BPF tool
    # execsnoop.py -T
    TIME(s) PCOMM
    run
    bash
    svstat
    perl
    grep
    sed
    xargs
    cut
    echo
    mkdir
    [...]
    run
    bash
    svstat
    perl
    [...]
    PID
    PPID
    RET ARGS
    0 ./run
    0 /bin/bash
    0 /command/svstat /service/httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;p...
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/xargs
    0 /usr/bin/cut -d -f 1
    0 /bin/echo
    0 /bin/mkdir -v -p /data/tomcat
    0 ./run
    0 /bin/bash
    0 /command/svstat /service/httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;p...
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 93:
    Example bpftrace one-liner
    # bpftrace -e 't:block:block_rq_issue { @[args->gt;rwbs] = count(); }'
    Attaching 1 probe...
    @[R]: 1
    @[RM]: 1
    @[WFS]: 2
    @[FF]: 3
    @[WSM]: 9
    @[RA]: 10
    @[WM]: 12
    @[WS]: 29
    @[R]: 107
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 94:
    libbpf-tools
    # ./opensnoop
    PID
    COMM
    27974 opensnoop
    redis-server
    […]
    FD ERR PATH
    0 /etc/localtime
    0 /proc/1482/stat
    # ldd opensnoop
    linux-vdso.so.1 (0x00007ffddf3f1000)
    libelf.so.1 =>gt; /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f9fb7836000)
    libz.so.1 =>gt; /lib/x86_64-linux-gnu/libz.so.1 (0x00007f9fb7619000)
    libc.so.6 =>gt; /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9fb7228000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f9fb7c76000)
    # ls -lh opensnoop opensnoop.stripped
    -rwxr-xr-x 1 root root 645K Feb 28 23:18 opensnoop
    -rwxr-xr-x 1 root root 151K Feb 28 23:33 opensnoop.stripped
    151 Kbytes for a stand-alone BPF program!
    (Note: A static bpftrace/BTF + scripts will also have a small average tool size)
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 95:
    Modern Observability Stack
    OpenTelemetry
    Grafana
    Standard for monitoring and tracing
    Prometheus
    Monitoring database
    Grafana
    UI with dashboards
    Source: Figure 1.4 [Gregg 20]
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 96:
    My Prediction: BPF tool front-ends
    bpftrace
    For one-liners and to hack up new tools
    When you want to spend an afternoon developing some custom BPF tracing
    libbpf-tools
    For packaged BPF binary tools and BPF products
    When you want to spend weeks developing BPF
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 97:
    My Prediction: Too many BPF tools
    (I’m partly to blame)
    2014: I have no tools for this problem
    2024: I have too many tools for this problem
    Tool creators: Focus on solving something no other tool can. Necessity is the mother of good BPF tools.
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 98:
    My Prediction: BPF perf tool future
    GUIs, not CLI tools
    Tool output, visualized
    This GUI is in development by Susie Xia, Netflix
    The end user may not even know it’s using BPF
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 99:
    My Prediction: Flame scope adoption
    Analyze variance, perturbations:
    Flame graph
    Subsecond-offset heat map
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    [Spier 20]
    
slide 100:
    Recap
    1. Processors
    2. Memory
    3. Disks
    4. Networking
    5. Runtimes
    6. Kernels
    7. Hypervisors
    8. Observability
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 101:
    Performance engineering is getting more complex
    1. Processors: CPUs, GPUs, FPGAs, TPUs
    2. Memory: DRAM, RLDRAM, HBM, 3D XPoint
    3. Disks: PMR, SMR, MAT, EAMR, HAMR, SLC, MLC, ...
    4. Networking: QUIC, MP-TCP, XDP, qdiscs, pacing, BQL, ...
    5. Runtimes: Choice of JVM, GC, c2/Graal
    6. Kernels: BPF, io_uring, PGO, Linux complexity
    7. Hypervisors: VMs, Containers, LightweightVMs
    8. Observability: BPF, PMCs, heat maps, flame graphs,
    OpenTelemetry, Prometheus, Grafana
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 102:
    References
    [Gregg 08] Brendan Gregg, “ZFS L2ARC,” http://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html, Jul 2008
    [Gregg 10] Brendan Gregg, “Visualizations for Performance Analysis (and More),”
    https://www.usenix.org/conference/lisa10/visualizations-performance-analysis-and-more, 2010
    [Greenberg 11] Marc Greenberg, “DDR4: Double the speed, double the latency? Make sure your system can handle next-generation
    DRAM,” https://www.chipestimate.com/DDR4-Double-the-speed-double-the-latencyMake-sure-your-system-can-handlenext-generation-DRAM/Cadence/Technical-Article/2011/11/22, Nov 2011
    [Hruska 12] Joel Hruska, “The future of CPU scaling: Exploring options on the cutting edge,”
    https://www.extremetech.com/computing/184946-14nm-7nm-5nm-how-low-can-cmos-go-it-depends-if-you-ask-theengineers-or-the-economists, Feb 2012
    [Gregg 13] Brendan Gregg, “Blazing Performance with Flame Graphs,”
    https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg, 2013
    [Shimpi 13] Anand Lal Shimpi, “Seagate to Ship 5TB HDD in 2014 using Shingled Magnetic Recording,”
    https://www.anandtech.com/show/7290/seagate-to-ship-5tb-hdd-in-2014-using-shingled-magnetic-recording, Sep 2013
    [Borkmann 14] Daniel Borkmann, “net: tcp: add DCTCP congestion control algorithm,”
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?
    id=e3118e8359bb7c59555aca60c725106e6d78c5ce, 2014
    [Macri 15] Joe Macri, “Introducing HBM,” https://www.amd.com/en/technologies/hbm, Jul 2015
    [Cardwell 16] Neal Cardwell, et al., “BBR: Congestion-Based Congestion Control,” https://queue.acm.org/detail.cfm?id=3022184,
    [Gregg 16] Brendan Gregg, “Unikernel Profiling: Flame Graphs from dom0,” http://www.brendangregg.com/blog/2016-01-27/unikernelprofiling-from-dom0.html, Jan 2016
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 103:
    References (2)
    [Gregg 16b] Brendan Gregg, “Linux BPF Superpowers,” https://www.brendangregg.com/blog/2016-03-05/linux-bpf-superpowers.html,
    [Alcorn 17] Paul Alcorn, “Seagate To Double HDD Speed With Multi-Actuator Technology,” https://www.tomshardware.com/news/hddmulti-actuator-heads-seagate,36132.html, 2017
    [Alcorn 17b] Paul Alcorn, “Hot Chips 2017: Intel Deep Dives Into EMIB,” https://www.tomshardware.com/news/intel-emib-interconnectfpga-chiplet,35316.html#xenforo-comments-3112212, 2017
    [Corbet 17] Jonathan Corbet, “Two new block I/O schedulers for 4.12,” https://lwn.net/Articles/720675, Apr 2017
    [Gregg 17] Brendan Gregg, “AWS EC2 Virtualization 2017: Introducing Nitro,” http://www.brendangregg.com/blog/2017-11-29/awsec2-virtualization-2017.html, Nov 2017
    [Russinovich 17] Mark Russinovich, “Inside the Microsoft FPGA-based configurable cloud,”
    https://www.microsoft.com/en-us/research/video/inside-microsoft-fpga-based-configurable-cloud, 2017
    [Gregg 18] Brendan Gregg, “Linux Performance 2018,” http://www.brendangregg.com/Slides/Percona2018_Linux_Performance.pdf,
    [Hady 18] Frank Hady, “Achieve Consistent Low Latency for Your Storage-Intensive Workloads,”
    https://www.intel.com/content/www/us/en/architecture-and-technology/optane-technology/low-latency-for-storageintensive-workloads-article-brief.html, 2018
    [Joshi 18] Amit Joshi, et al., “Titus, the Netflix container management platform, is now open source,” https://netflixtechblog.com/titusthe-netflix-container-management-platform-is-now-open-source-f868c9fb5436, Apr 2018
    [Cutress 19] Dr. Ian Cutress, “Xilinx Announces World Largest FPGA: Virtex Ultrascale+ VU19P with 9m Cells,”
    https://www.anandtech.com/show/14798/xilinx-announces-world-largest-fpga-virtex-ultrascale-vu19p-with-9m-cells, Aug
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 104:
    References (3)
    [Gallatin 19] Drew Gallatin, “Kernel TLS and hardware TLS offload in FreeBSD 13,”
    https://people.freebsd.org/~gallatin/talks/euro2019-ktls.pdf, 2019
    [Redestad 19] Claes Redestad, Staffan Friberg, Aleksey Shipilev, “JEP 230: Microbenchmark Suite,” http://openjdk.java.net/jeps/230,
    updated 2019
    [Bearman 20] Ian Bearman, “Exploring Profile Guided Optimization of the Linux Kernel,”
    https://linuxplumbersconf.org/event/7/contributions/771, 2020
    [Burnes 20] Andrew Burnes, “GeForce RTX 30 Series Graphics Cards: The Ultimate Play,”
    https://www.nvidia.com/en-us/geforce/news/introducing-rtx-30-series-graphics-cards, Sep 2020
    [Charlene 20] Charlene, “800G Is Coming: Set Pace to More Higher Speed Applications,” https://community.fs.com/blog/800-gigabitethernet-and-optics.html, May 2020
    [Cutress 20] Dr. Ian Cutress, “Insights into DDR5 Sub-timings and Latencies,” https://www.anandtech.com/show/16143/insights-intoddr5-subtimings-and-latencies, Oct 2020
    [Ford 20] A. Ford, et al., “TCP Extensions for Multipath Operation with Multiple Addresses,”
    https://datatracker.ietf.org/doc/html/rfc8684, Mar 2020
    [Gregg 20] Brendan Gregg, “Systems Performance: Enterprise and the Cloud, Second Edition,” Addison-Wesley, 2020
    [Hruska 20] Joel Hruska, “Intel Demos PCIe 5.0 on Upcoming Sapphire Rapids CPUs,”
    https://www.extremetech.com/computing/316257-intel-demos-pcie-5-0-on-upcoming-sapphire-rapids-cpus,
    Oct 2020
    [Liu 20] Linda Liu, “Samsung QVO vs EVO vs PRO: What’s the Difference? [Clone Disk],”
    https://www.partitionwizard.com/clone-disk/samsung-qvo-vs-evo.html, 2020
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 105:
    References (4)
    [Moore 20] Samuel K. Moore, “A Better Way to Measure Progress in Semiconductors,”
    https://spectrum.ieee.org/semiconductors/devices/a-better-way-to-measure-progress-in-semiconductors, Jul 2020
    [Peterson 20] Zachariah Peterson, “DDR5 vs. DDR6: Here's What to Expect in RAM Modules,” https://resources.altium.com/p/ddr5-vsddr6-heres-what-expect-ram-modules, Nov 2020
    [Salter 20] Jim Salter, “Western Digital releases new 18TB, 20TB EAMR drives,” https://arstechnica.com/gadgets/2020/07/westerndigital-releases-new-18tb-20tb-eamr-drives, Jul 2020
    [Spier 20] Martin Spier, Brendan Gregg, et al., “FlameScope,” https://github.com/Netflix/flamescope, 2020
    [Tolvanen 20] Sami Tolvanen, Bill Wendling, and Nick Desaulniers, “LTO, PGO, and AutoFDO in the Kernel,” Linux Plumber’s
    Conference, https://linuxplumbersconf.org/event/7/contributions/798, 2020
    [Vega 20] Juan Camilo Vega, Marco Antonio Merlini, Paul Chow, “FFShark: A 100G FPGA Implementation of BPF Filtering for
    Wireshark,” IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) ,
    [Warren 20] Tom Warren, “Microsoft reportedly designing its own ARM-based chips for servers and Surface PCs,”
    https://www.theverge.com/2020/12/18/22189450/microsoft-arm-processors-chips-servers-surface-report, Dec 2020
    [Alcorn 21] Paul Alcorn, “Intel Shares Alder Lake Pricing, Specs and Gaming Performance: $589 for 16 Cores,”
    https://www.tomshardware.com/features/intel-shares-alder-lake-pricing-specs-and-gaming-performance, Oct 2021
    [Cutress 21] Ian Cutress, “AMD Demonstrates Stacked 3D V-Cache Technology: 192 MB at 2 TB/sec,”
    https://www.anandtech.com/show/16725/amd-demonstrates-stacked-vcache-technology-2-tbsec-for-15-gaming, May 2021
    [Google 21] Google, “Cloud TPU,” https://cloud.google.com/tpu, 2021
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 106:
    References (5)
    [Haken 21] Michael Haken, et al., “Delta Lake 1S Server Design Specification 1v05, https://www.opencompute.org/documents/deltalake-1s-server-design-specification-1v05-pdf, 2021
    [Intel 21] Intel corporation, “Intel® OptaneTM Technology,” https://www.intel.com/content/www/us/en/products/docs/storage/optanetechnology-brief.html, 2021
    [Kostovic 21] Aleksandar Kostovic, “Esperanto Delivers Kilocore Processor in its Supercomputer-on-a-Chip Design,”
    https://www.tomshardware.com/news/esperanto-kilocore-processor, Aug 2021
    [Kummrow 21] Patricia Kummrow, “The IPU: A New, Strategic Resource for Cloud Service Providers,”
    https://itpeernetwork.intel.com/ipu-cloud/#gs.g5pkub, Aug 2021
    [Quach 21a] Katyanna Quach, “Global chip shortage probably won't let up until 2023, warns TSMC: CEO 'still expects capacity to
    tighten more',” https://www.theregister.com/2021/04/16/tsmc_chip_forecast, Apr 2021
    [Quach 21b] Katyanna Quach, “IBM says it's built the world's first 2nm semiconductor chips,”
    https://www.theregister.com/2021/05/06/ibm_2nm_semiconductor_chips, May 2021
    [Ridley 21] Jacob Ridley, “IBM agrees with Intel and TSMC: this chip shortage isn't going to end anytime soon,”
    https://www.pcgamer.com/ibm-agrees-with-intel-and-tsmc-this-chip-shortage-isnt-going-to-end-anytime-soon, May 2021
    [Shilov 21] Anton Shilov, “Samsung Develops 512GB DDR5 Module with HKMG DDR5 Chips,”
    https://www.tomshardware.com/news/samsung-512gb-ddr5-memory-module, Mar 2021
    [Shilov 21b] Anton Shilov, “Seagate Ships 20TB HAMR HDDs Commercially, Increases Shipments of Mach.2 Drives,”
    https://www.tomshardware.com/news/seagate-ships-hamr-hdds-increases-dual-actuator-shipments, 2021
    [Shilov 21c] Anton Shilov, “SK Hynix Envisions 600-Layer 3D NAND & EUV-Based DRAM,” https://www.tomshardware.com/news/skhynix-600-layer-3d-nand-euv-dram, Mar 2021
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 107:
    References (6)
    [Shilov 21d] Anton Shilov, “Sapphire Rapids Uncovered: 56 Cores, 64GB HBM2E, Multi-Chip Design,”
    https://www.tomshardware.com/news/intel-sapphire-rapids-xeon-scalable-specifications-and-features, Apr 2021
    [SuperMicro 21] SuperMicro, “B12SPE-CPU-25G (For SuperServer Only),”
    https://www.supermicro.com/en/products/motherboard/B12SPE-CPU-25G, 2021
    [Thaler 21] Dave Thaler, Poorna Gaddehosur, “Making eBPF work on Windows,”
    https://cloudblogs.microsoft.com/opensource/2021/05/10/making-ebpf-work-on-windows, May 2021
    [TornadoVM 21] TornadoVM, “TornadoVM Run your software faster and simpler!” https://www.tornadovm.org, 2021
    [Trader 21] Tiffany Trader, “Cerebras Second-Gen 7nm Wafer Scale Engine Doubles AI Performance Over First-Gen Chip ,”
    https://www.enterpriseai.news/2021/04/21/latest-cerebras-second-gen-7nm-wafer-scale-engine-doubles-ai-performanceover-first-gen-chip, Apr 2021
    [Vahdat 21] Amin Vahdat, “The past, present and future of custom compute at Google,”
    https://cloud.google.com/blog/topics/systems/the-past-present-and-future-of-custom-compute-at-google, Mar 2021
    [Wikipedia 21] “Semiconductor device fabrication,” https://en.wikipedia.org/wiki/Semiconductor_device_fabrication, 2021
    [Wikipedia 21b] “Silicon,” https://en.wikipedia.org/wiki/Silicon, 2021
    [ZonedStorage 21] Zoned Storage, “Zoned Namespaces (ZNS) SSDs,” https://zonedstorage.io/introduction/zns, 2021
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)
    
slide 108:
    Thanks
    Thanks for watching!
    Slides: http://www.brendangregg.com
    Thanks to colleagues Jason Koch, Sargun Dhillon, and Drew Gallatin for their
    performance engineering expertise.
    Thanks to YOW organizers!
    YOW! Computing Performance 2021: What’s On the Horizon (Brendan Gregg)