News: 0001612539

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Linux 7.0 Adds support For BPF Filtering To IO_uring

([Linux Kernel] 4 Hours Ago IO_uring BPF Filtering)


The wonderful [1]IO_uring for the Linux kernel for high performance asnyc I/O has picked up a new capability with Linux 7.0: BPF filtering.

Linux I/O expert Jens Axboe implemented support for loading BPF programs with IO_uring for offering fine-grained filtering of SQE operations. This BPF filtering for IO_uring can inspect request attributes and make dynamic filtering decisions compared to existing facilities for filtering. Filters can allow or deny requests, allow multiple filters to be stacked per opcode and is done using classic BPF programs rather than eBPF programs to allow for container uses.

"This adds support for both cBPF filters for io_uring, as well as task inherited restrictions and filters.

seccomp and io_uring don't play along nicely, as most of the interesting data to filter on resides somewhat out-of-band, in the submission queue ring.

As a result, things like containers and systemd that apply seccomp filters, can't filter io_uring operations.

That leaves them with just one choice if filtering is critical - filter the actual io_uring_setup(2) system call to simply disallow io_uring. That's rather unfortunate, and has limited us because of it.

io_uring already has some filtering support. It requires the ring to be setup in a disabled state, and then a filter set can be applied. This filter set is completely bi-modal - an opcode is either enabled or it's not. Once a filter set is registered, the ring can be enabled. This is very restrictive, and it's not useful at all to systemd or containers which really want both broader and more specific control.

This first adds support for cBPF filters for opcodes, which enables tighter control over what exactly a specific opcode may do. As examples, specific support is added for IORING_OP_OPENAT/OPENAT2, allowing filtering on resolve flags. And another example is added for IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These are both common use cases. cBPF was chosen rather than eBPF, because the latter is often restricted in containers as well."

[2]This merge yesterday to Linux 7.0 landed the IO_uring BPF filtering capabilities.



[1] https://www.phoronix.com/search/IO_uring

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=591beb0e3a03258ef9c01893a5209845799a7c33



If I had my life to live over, I'd try to make more mistakes next time. I
would relax, I would limber up, I would be sillier than I have been this
trip. I know of very few things I would take seriously. I would be crazier.
I would climb more mountains, swim more rivers and watch more sunsets. I'd
travel and see. I would have more actual troubles and fewer imaginary ones.
You see, I am one of those people who lives prophylactically and sensibly
and sanely, hour after hour, day after day. Oh, I have had my moments and,
if I had it to do over again, I'd have more of them. In fact, I'd try to
have nothing else. Just moments, one after another, instead of living so many
years ahead each day. I have been one of those people who never go anywhere
without a thermometer, a hotwater bottle, a gargle, a raincoat and a parachute.
If I had it to do over again, I would go places and do things and travel
lighter than I have. If I had my life to live over, I would start bare-footed
earlier in the spring and stay that way later in the fall. I would play hooky
more. I probably wouldn't make such good grades, but I'd learn more. I would
ride on more merry-go-rounds. I'd pick more daisies.