Linux 6.11 Brings A Dedicated Bucket Allocator For Better Security
([Linux Kernel] 5 Hours Ago
Linux 6.11)
- Reference: 0001479765
- News link: https://www.phoronix.com/news/Linux-6.11-Ded-Bucket-Alloc
- Source link:
The SLAB pull request landed in Linux 6.11 Git on Thursday with kmem_buckets-based hardening of kernel memory allocations.
This hardening is the latest Linux security improvement addressed by Google's Kees Cook. The new CONFIG_SLAB_BUCKETS build-time option explains of this dedicated bucket allocator:
"Kernel heap attacks frequently depend on being able to create specifically-sized allocations with user-controlled contents that will be allocated into the same kmalloc bucket as a target object. To avoid sharing these allocation buckets, provide an explicitly separated set of buckets to be used for user-controlled allocations. This may very slightly increase memory fragmentation, though in practice it's only a handful of extra pages since the bulk of user-controlled allocations are relatively long-lived."
Kees Cook further explained in the prior [1]patch series of this feature:
"Dedicated caches are available for fixed size allocations via kmem_cache_alloc(), but for dynamically sized allocations there is only the global kmalloc API's set of buckets available. This means it isn't possible to separate specific sets of dynamically sized allocations into a separate collection of caches.
This leads to a use-after-free exploitation weakness in the Linux kernel since many heap memory spraying/grooming attacks depend on using userspace-controllable dynamically sized allocations to collide with fixed size allocations that end up in same cache.
While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense against these kinds of "type confusion" attacks, including for fixed same-size heap objects, we can create a complementary deterministic defense for dynamically sized allocations that are directly user controlled. Addressing these cases is limited in scope, so isolating these kinds of interfaces will not become an unbounded game of whack-a-mole. For example, many pass through memdup_user(), making isolation there very effective.
In order to isolate user-controllable dynamically-sized allocations from the common system kmalloc allocations, introduce kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for where caller tracking is needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback is needed. Note that these caches are specifically flagged with SLAB_NO_MERGE, since merging would defeat the entire purpose of the mitigation.
This can also be used in the future to extend allocation profiling's use of code tagging to implement per-caller allocation cache isolation even for dynamic allocations."
This dedicated bucket allocator landed in the Linux 6.11 kernel yesterday via the [2]SLAB pull request .
[1] https://lore.kernel.org/netdev/202407021311.1EDB7AE3@keescook/T/#m3f69ec81c1f388b8061d5c49ee63728da4dbf63a
[2] https://lore.kernel.org/lkml/746087fd-993b-47b3-99e4-9bd4d3502e71@suse.cz/
This hardening is the latest Linux security improvement addressed by Google's Kees Cook. The new CONFIG_SLAB_BUCKETS build-time option explains of this dedicated bucket allocator:
"Kernel heap attacks frequently depend on being able to create specifically-sized allocations with user-controlled contents that will be allocated into the same kmalloc bucket as a target object. To avoid sharing these allocation buckets, provide an explicitly separated set of buckets to be used for user-controlled allocations. This may very slightly increase memory fragmentation, though in practice it's only a handful of extra pages since the bulk of user-controlled allocations are relatively long-lived."
Kees Cook further explained in the prior [1]patch series of this feature:
"Dedicated caches are available for fixed size allocations via kmem_cache_alloc(), but for dynamically sized allocations there is only the global kmalloc API's set of buckets available. This means it isn't possible to separate specific sets of dynamically sized allocations into a separate collection of caches.
This leads to a use-after-free exploitation weakness in the Linux kernel since many heap memory spraying/grooming attacks depend on using userspace-controllable dynamically sized allocations to collide with fixed size allocations that end up in same cache.
While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense against these kinds of "type confusion" attacks, including for fixed same-size heap objects, we can create a complementary deterministic defense for dynamically sized allocations that are directly user controlled. Addressing these cases is limited in scope, so isolating these kinds of interfaces will not become an unbounded game of whack-a-mole. For example, many pass through memdup_user(), making isolation there very effective.
In order to isolate user-controllable dynamically-sized allocations from the common system kmalloc allocations, introduce kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for where caller tracking is needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback is needed. Note that these caches are specifically flagged with SLAB_NO_MERGE, since merging would defeat the entire purpose of the mitigation.
This can also be used in the future to extend allocation profiling's use of code tagging to implement per-caller allocation cache isolation even for dynamic allocations."
This dedicated bucket allocator landed in the Linux 6.11 kernel yesterday via the [2]SLAB pull request .
[1] https://lore.kernel.org/netdev/202407021311.1EDB7AE3@keescook/T/#m3f69ec81c1f388b8061d5c49ee63728da4dbf63a
[2] https://lore.kernel.org/lkml/746087fd-993b-47b3-99e4-9bd4d3502e71@suse.cz/
overwatch