Thursday, March 13, 2008

Directories - a flawed concept...

Yes, I'm talking about thy usual directory. Or folder, if you wish. What your OS provides you with. And you explore with "Windows Explorer", or some clone, or some *nix tools.

It's a concept invented by a programmer, long long time ago, and I'm sure it seemed a very nice idea at the time - a way to structure files. And to us, the programmers, folders are 2nd nature - it's so much in our blood, that we never question it.

It's a flawed concept!

Why?

Because finding a file is just too hard:

  • the user needs to know too much about the file's location - basically, he needs to know all the path leading up to it
  • or, he needs to search for it (which takes CPU time, he could be searching in the wrong place, etc)
  • when dealing with multiple files/ locations, there's too much burden - on where which file/folder is

But now, the root problem: an item (i.e., file) can and usually belongs to more than one place.

The examples are endless:

  • that Excel report you just made for HR, belongs to at least 2 categories: HR and Reports
  • that cool header file you just added to your project, belongs to at least 2 categories: Headers and [ModuleName] (the module it's for)
  • that movie you just saw belongs to lots of categories: Movies, SF, Thriller, Kevin Spacey, etc
  • just rethink a bit about some of your files: you'll find at least one more category than the one they're in

Directories just show a too-limited view of the world...

Workaround #1: Shortcuts, Hard Links, Soft Links

... are just a workaround for a flawed way of storing information. Each file that is somewhat important to you would need a few shortcuts. You’d end up having hundreds or thousands of shortcuts. But who has the time to create them? And then, when you move/rename a file, then what?

Workaround #2: Virtual Folders

... same thing (virtual folder = the results of a search, shown as a folder). This is not what you want. It’s not as if every movie name has “movie” as its prefix so that a search for “movie” will return all movies...

The solution: Categories and Aliases

As I already pointed out, categories are a much more meaningful way of storing information. Most information belongs to more than one category.


Aren’t familiar with the term categories? Maybe you’re more familiar with other terms: labels, tags:

  • when you write a blog entry, you can specify multiple labels: it’s the same thing – an entry can have multiple labels; a user can select a label, and see all entries that have that label
  • when using digg, you use tags, with the same meaning: a certain site can belong to multiple tags
  • Google Reader uses the term “folder”; however, when you subscribe to a feed, the subscription can be placed into multiple folders.

All the above show how easily you can structure information into categories.

But how do I browse?

Having this hierarchy-based thinking so built into us, you’ll definitely ask that. Browsing can still happen in an Explorer like fashion:

  • assume you have a root. Lets call it “/”
  • expanding the root will show some aliases; I’ll explain them in a moment (for now, just think of them as some very clever shortcuts)
  • expanding an alias will show categories, other possible aliases, and files
  • there will be an alias called “Everything” – expanding it will show all categories and all files
  • expanding a category will show
    • all the files that are in all categories expanded so far (including the now expanded category) and
    • all the other categories that share files with all expanded categories

The last one is a bit tricky, so I’ll give you a short example.

What’s an alias?

An alias is similar to a shortcut, but in the context of categories. An alias is:

  • a union of one or more categories
  • you can treat it exactly like a category

An alias is just a better name, suited for you - the user, one that names what you want in a simpler way. For instance, you could call the group of categories [“Incoming" and "Document" and "Resumes”] as “Incoming CVs”.

There’s a fundamental difference between an alias and what you currently know as a shortcut. But it’s not very relevant here, and I’ll let the diligent readers find it.

Benefits

The benefits are too many, here are just a few:

  • straightforward storage: the storage is made with the you – the user – in mind; information is organized into categories, matching the user’s thinking; when a file logically belongs to a category, just add it there – simple, straightforward, and easy
  • easier search: finding information that belongs to several categories is as easy as expanding those categories, and seeing what’s there. Just think how easy it can be to find all movies that are SF and comedy. How about cartoons that are SF?
  • no unneeded duplication: if you want to add the file to a category, just do it – you don’t need to copy or move it; it will then belong to one more category
  • easy to personalize: you can easily select groups of categories which you use often, and create aliases for them; this beats shortcuts by far. Imagine an alias to My Last 3 projects, or Anything with Kevin Spacey.

Other things you can do:

  • History: can be a category; some program can update this by keeping a history of, lets say, last 100 opened files. Each opened file is automatically added to “History” category
  • Favorites: (have a category called “Favorites”) anything can be added to Favorites, not just URLs
  • Last Searches: the Explorer program can remember the last 20 searches. After a search is run, an automatic category is created, called “Searching ... on ”, and the files that match the search will automatically be added there
  • Visited Web Pages: pages that are viewed, can be placed into aliases called “Today”, “Yesterday”... Assuming today is 12 March 2008, I’ll have a category called “12 March 2008”, and “Today” will be an alias to it. When the day becomes 13 March 2008, the “Today” alias will point to “13 March 2008”, and “Yesterday” will point to “12 March 2008”. This way, you can keep a more detailed history with pages visited each day, for the last, lets say 50 days.


What next?

Using categories instead of folders should have happened long time ago. WinFS had some intent of implementing something similar.

All the above is doable – on Windows, simply have a virtual drive on which you implement “Categories”. I’m not saying it’s easy, but it’ll be:

  • cool to implement
  • challenging
  • a real benefit for the users

Sounds like an excellent job for me! I’ll talk to Microsoft, Google and other companies, to see who wants me to implement this for them.

And I’ll be the first user of Category Explorer!

12 comments:

Bourgeois said...

I agree, and it reminds me of http://www.boost.org/libs/multi_index/doc/index.html

bbmihai said...

This would be the ultimate explorer. Interesting idea.

John Torjo said...

@burgeios : yup, I see some similarity

@pupu: indeed so ;) On top of it, I'll also allow an API to be able to expore files in a directory-like fashion (a virtual drive, that is).

Lucian Ciufudean said...

I must admit I spent 5 minutes going through the example and back to the paragraph explaining it and forth ...

This is an interesting idea, and like all ideas that revolution something, benefits are harder to grasp until one sees it implemented, I am looking fwd to it.

Yes, we have this hierarchy-based thinking so built into us but when it is light it doesn't have to be bad, look back at your browsing method, it has still hierarchical concepts in it.

Note that the functionality could be implemented without the hierarchical concept in mind (saying I’m in “/Everything/Document/HR/Report”) but instead have only a flat list of categories, and the user will check some and uncheck others.

John Torjo said...

"Note that the functionality could be implemented without the hierarchical concept in mind (saying I’m in “/Everything/Document/HR/Report”) but instead have only a flat list of categories, and the user will check some and uncheck others."

Very true! This is - explorer wise. As I said, what I showed was one way to explore - there could be others. What you just said - that could be a possibility.

However, we should not forget that there are a lot of programs that need to query for files - for those, we need to emulate a hierarchical view.

Vinzenz Feenstra said...

I think this approach will be a usefull thing.

I can also think of some Version Control System like behavious in storing (and backing up that way) data

Basically this just needs some kind of Database Filesystem where you create a view by sql commands and generating tags for files :)

Regards
Vinzenz Feenstra :)

Mahoney said...

I very much like this idea. I think this would be a wonderful addition to a newer version of Windows. The version after 7, maybe. But, as previously stated, There would have to be a hierarchical system to it just as in current versions of Windows. It would have to be virtual categories. We all know how much whining that Microsoft is already getting with compatibility.

MattiasWikstrom said...

I cannot quite agree.

Perhaps the most important benefit of directory trees is that they allow for separation/partitioning. Different programs can be assigned different compartments (directories) and happily live their lives in separation. Similarly, users can be separated from each other. No program and no user should have access to "everything".

The meaning of "everything" is relative in any case. It could mean everything on a disk (or disk partition), everything on a computer, everything on a network, or maybe just everything contained in a zip/tar archieve. Why not allow an "everything" to be partitioned into smaller "everythings" the way a hierarchial directory system does?

I think the problem is just that this feature is being misused and should be combined with other features. One could imagine a directory system which allowed an individual directory to have a custom organisation (in particular, one where files are organised into categories) instead of the ordinary hierarchial one.

Now, complicated file systems have the disadvantage that it is more difficult to write software that deals with them correctly, and users may be confused by them as well, so perhaps some kind of compromise should be sought.

truetim said...

I think MattiasWikstrom's comments are highlighting a valid issue. Currently, the "folder" approach has been overloaded to handle both organizational and security functions. They would have to be unraveled. Perhaps the system automatically adds an owner-visibility category?

Sounds fun.

Unknown said...

Strikes me that this is a good way of organising data, software would need to exist sub categories of two aliases analogus to the program files (user installed) and windows (OS) folders in windows. I don't see how this would hinder seperation of software.

MattiasWikstrom said...
This comment has been removed by the author.
MattiasWikstrom said...

I think I can summarise my philosophy by saying that when doing a search it is necessary to know two things:
1. Where to search, the search space (for example, a hard drive or a network).
2. What sort of things to search for (for example, pictures of animals created after a certain date).

I think traditional directory systems are strong when it comes to 1. but weak when it comes to 2. (the possibility of specifying file attributes such as "system" or "archive" indicates a need for classification, but one can obviously do much better), and I think the opposite is true for a typical category-based system.

I take it that this is really a disguised version of the debate on whether databases should be hierarchial or relational, and given that XML combines the best features of these two types of databases and in addition enjoys extremely widespread support, why not use a file system which appears to programmers as if it were a gigantic XML file?