News: 0000829737

Krisman: Using the Linux kernel's Case-insensitive feature in Ext4

([Kernel] Aug 27, 2020 22:45 UTC (Thu) (jake))

Reference: 0000829737
News link: https://lwn.net/Articles/829737
Source link:

On the Collabora blog, Gabriel Krisman Bertazi [1]writes about a feature he developed: [2]case-insensitive ext4 . He describes how to enable the feature in the kernel (>= 5.2), how to create an ext4 filesystem that will support case-insensitive lookups, as well as some gotchas; he starts with some justification for the idea: " A file name is a text string used to uniquely identify a file (in this context, 'directory' is the same as a file) at a specific level of the directory hierarchy. While, from the operating system point of view, it doesn't matter what the file name is, as long as it is unique, meaningful file names are essential for the end user, since it is the main key to locate and retrieve data. In other words, a meaningful file name is what people rely upon to find their valuable documents, pictures and spreadsheets. Traditionally, Linux (and Unix) filesystems have always considered file names as an opaque byte sequence without any special meaning, requiring users to submit the exact match of the file to find it in the filesystem. But that is not how humans operate. When people write titles, 'important report.ods' and 'IMPORTANT REPORT.ods' usually mean the same piece of data, and you don't care how it was written when creating it. We care about the content and the semantics of the words IMPORTANT and REPORT. "

[1] https://www.collabora.com/news-and-blog/blog/2020/08/27/using-the-linux-kernel-case-insensitive-feature-in-ext4/

[2] https://lwn.net/Articles/784041/

Krisman: Using the Linux kernel's Case-insensitive feature in Ext4

Why are people doomed to recreate mistakes of the past? Surely they are aware of them.

The place for this is not in a filesystem, it's in higher-level interfaces to it. A filesystem needs to be rigorous in minimizing "gotchas", because it has many layers depending on it. Monkeying around with semantics is better done with these higher layers who have a far smaller list of software which can be broken by their changes and can evolve with it.

The long shadow of Windows and its choices continue to haunt.

Krisman: Using the Linux kernel's Case-insensitive feature in Ext4

Why are people doomed to recreate mistakes of the past? Surely they are aware of them.

The long shadow of Windows and its choices continue to haunt.

color me sceptical

He gives the example of a file named "floß" being looked up using the name "FLOSS", successfully. But could a file originally named "Floss" be looked up using the name "Floß"? I'm not so sure.

It's just another question to be answered as the semantics are clarified.

color me sceptical

> He gives the example of a file named "floß" being looked up using the name "FLOSS", successfully. But could a file

> originally named "Floss" be looked up using the name "Floß"? I'm not so sure.

The article is more of a higher level overview and the floß serves to exemplify what we mean by the complexity of non-english languages, I didn't mean to show the strict semantics with that one :)

If you check documentation it will show we use Unicode's canonical decomposition for normalization (NFD) with small modifications, documented in ./admin-guide/ext4.rst

color me sceptical

> He gives the example of a file named "floß" being looked up using the name "FLOSS", successfully. But could a file

> originally named "Floss" be looked up using the name "Floß"? I'm not so sure.

The article is more of a higher level overview and the floß serves to exemplify what we mean by the complexity of non-english languages, I didn't mean to show the strict semantics with that one :)

If you check documentation it will show we use Unicode's canonical decomposition for normalization (NFD) with small modifications, documented in ./admin-guide/ext4.rst

Krisman: Using the Linux kernel's Case-insensitive feature in Ext4

Honestly I can only see the argument being valid if one is operating under the predicate that average users are expected to use a cli.

If it's not a cli I can't see how it matters. GUI software will display the correct things, and if there's a search, it can default to case-insensitive (if that's a sane default for the expected user base).