My last talk for 2017 was at AWS re:Invent, on "How Netflix Tunes EC2 Instances for Performance," an updated version of my 2014 talk. There was so much demand for it this year that I had three overflow rooms streaming it, and people still couldn't get in. (I shouldn't let this go to my head, as there were 42,000 attendees at re:Invent looking for something to see!) Fortunately, it was videoed for those who missed it.
A video of the talk is on youtube:
The slides are on slideshare:
I love this talk as I get to share more about what the Performance and Operating Systems team at Netflix does, rather than just my work. Our team looks after the BaseAMI, kernel tuning, OS performance tools and profilers, and self-service tools like Vector. We're not the only people doing performance and performance tuning at Netflix either: all the development teams do performance work. We help where we can.
My talk included a section on Linux kernel tunables, as follows. WARNING: These tunables were developed in late 2017, for Ubuntu Xenial instances on EC2.
schedtool –B PID
vm.swappiness = 0 # from 60
# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
kernel.numa_balancing = 0
vm.dirty_ratio = 80 # from 40 vm.dirty_background_ratio = 5 # from 10 vm.dirty_expire_centisecs = 12000 # from 3000 mount -o defaults,noatime,discard,nobarrier …
/sys/block/*/queue/rq_affinity 2 /sys/block/*/queue/scheduler noop /sys/block/*/queue/nr_requests 256 /sys/block/*/queue/read_ahead_kb 256 mdadm –chunk=64 ...
net.core.somaxconn = 1000 net.core.netdev_max_backlog = 5000 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 net.ipv4.tcp_rmem = 4096 12582912 16777216 net.ipv4.tcp_max_syn_backlog = 8096 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 10240 65535 net.ipv4.tcp_abort_on_overflow = 1 # maybe
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
Not a lot has changed with these tunables since my 2014 talk.
What I was most excited for was the launch of a new EC2 hypervisor, which I referred to in the video as the "c5 hypervisor". Later that night the real name was released: the "Nitro" hypervisor, as well as the bare metal instance type. My last post on Introducing Nitro explained it and the hypervisor development journey.
Many other Netflix staff spoke at re:Invent (list here). Here are the talks from my immediate colleagues in building F, level 2 at Netflix:
- Vadim Filanovsky (perf team) co-presented Auto Scaling Made Easy: How Target Tracking Scaling Policies Hit the Bullseye
- Dave Hahn (CORE team) gave an updated A Day in the Life of a Netflix Engineer III
- Nora Jones (chaos team) gave a keynote on Why We Need More Chaos - Chaos Engineering, That Is, as well as a talk Performing Chaos at Netflix Scale
- Casey Rosenthal (traffic and chaos) Models of Availability
- John Bennett (networking) co-presented How Netflix Monitors Applications in Near Real-Time with Amazon Kinesis
- Donovan Fritz and Joel Kodama (networking) A Day in the Life of a Cloud Network Engineer at Netflix
- Alex Maestretti (security) co-presented SecOps 2021 Today: Using AWS Services to Deliver SecOps
- Will Bengtson (security) co-presented Best Practices for Managing Security Operations on AWS
- Patrick Kelley and Travis McPeak (security) Using Access Advisor to Strike the Balance Between Security and Usability
- Andrew Spyker (Titus) co-presented Elastic Load Balancing Deep Dive and Best Practices
- Andrew Park and Sebastien de Larquier Tooling Up for Efficiency: DIY Solutions @ Netflix
- Rajan Mittal and Andrew Park Why Regional Reserved Instances Are a Game Changer for Netflix
- Monal Daxini Netflix Keystone SPaaS: Real-time Stream Processing as a Service
- Our department director Coburn Watson Walking the tightrope: Balancing Innovation, Reliability, Security, and Efficiency
Check them out. It's awesome to see my coworkers on the big stage doing great!