Bcachefs Lands More Improvements For Linux 6.16 After Data Loss Bug Hit v6.15
([Linux Storage] 20 Minutes Ago
Bcachefs)
- Reference: 0001551286
- News link: https://www.phoronix.com/news/Bcachefs-More-For-Linux-6.16
- Source link:
Last week many [1]Bcachefs performance optimizations, recovery work, and enhanced error messages were merged at the start of the Linux 6.16 merge window. Now ahead of the Linux 6.16-rc1 release coming on Sunday to cap off the merge window, a second round of Bcachefs enhancements and fixes were merged.
On Wednesday evening the Bcachefs lead developer Kent Overstreet sent out a second round of Bcachefs changes. The code has since been merged with this mix of fixes and other feature changes.
This second batch of Bcachefs changes for Linux 6.16 include more stack usage improvements, new introspection capabilities, various repair enhancements, and addressing a serious bug that cropped up in Linux 6.15 and could result in data loss.
As for that Linux 6.15 bug, it's resulted in data loss. Overstreet commented in the pull request:
"This took longer to debug than it should have, and we lost several filesystems unnecessarily, becuase users have been ignoring the release notes and blindly running 'fsck -y'. Debugging required reconstructing what happened through analyzing the journal, when ideally someone would have noticed 'hey, fsck is asking me if I want to repair this: it usually doesn't, maybe I should run this in dry run mode and check what's going on?'.
As a reminder, fsck errors are being marked as autofix once we've verified, in real world usage, that they're working correctly; blindly running 'fsck -y' on an experimental filesystem is playing with fire.
Up to this incident we've had an excellent track record of not losing data, so let's try to learn from this one.
This is a community effort, I wouldn't be able to get this done without the help of all the people QAing and providing excellent bug reports and feedback based on real world usage. But please don't ignore advice and expect me to pick up the pieces.
If an error isn't marked as autofix, and it /is/ happening in the wild, that's also something I need to know about so we can check it out and add it to the autofix list if repair looks good. I haven't been getting those reports, and I should be; since we don't have any sort of telemetry yet I am absolutely dependent on user reports.
Now I'll be spending the weekend working on new repair code to see if I can get a filesystem back for a user who didn't have backups."
See [2]this pull request for the latest merged Bcachefs code.
[1] https://www.phoronix.com/news/Linux-6.16-Bcachefs
[2] https://lore.kernel.org/lkml/xtigikvqorbxtpy2rh52fobvunp7yrwkfpj4muwaogr4ijxl4j@s327kfvhpi3v/
On Wednesday evening the Bcachefs lead developer Kent Overstreet sent out a second round of Bcachefs changes. The code has since been merged with this mix of fixes and other feature changes.
This second batch of Bcachefs changes for Linux 6.16 include more stack usage improvements, new introspection capabilities, various repair enhancements, and addressing a serious bug that cropped up in Linux 6.15 and could result in data loss.
As for that Linux 6.15 bug, it's resulted in data loss. Overstreet commented in the pull request:
"This took longer to debug than it should have, and we lost several filesystems unnecessarily, becuase users have been ignoring the release notes and blindly running 'fsck -y'. Debugging required reconstructing what happened through analyzing the journal, when ideally someone would have noticed 'hey, fsck is asking me if I want to repair this: it usually doesn't, maybe I should run this in dry run mode and check what's going on?'.
As a reminder, fsck errors are being marked as autofix once we've verified, in real world usage, that they're working correctly; blindly running 'fsck -y' on an experimental filesystem is playing with fire.
Up to this incident we've had an excellent track record of not losing data, so let's try to learn from this one.
This is a community effort, I wouldn't be able to get this done without the help of all the people QAing and providing excellent bug reports and feedback based on real world usage. But please don't ignore advice and expect me to pick up the pieces.
If an error isn't marked as autofix, and it /is/ happening in the wild, that's also something I need to know about so we can check it out and add it to the autofix list if repair looks good. I haven't been getting those reports, and I should be; since we don't have any sort of telemetry yet I am absolutely dependent on user reports.
Now I'll be spending the weekend working on new repair code to see if I can get a filesystem back for a user who didn't have backups."
See [2]this pull request for the latest merged Bcachefs code.
[1] https://www.phoronix.com/news/Linux-6.16-Bcachefs
[2] https://lore.kernel.org/lkml/xtigikvqorbxtpy2rh52fobvunp7yrwkfpj4muwaogr4ijxl4j@s327kfvhpi3v/
phoronix