TWiki> Cenkes Web>FileSystem (09 Nov 2007, Main.AndrewPantyukhin)EditAttach
Tags:
create new tag
, view all tags

Musings on a perfect file system

Note: after I started writing this, I noticed that Hans Reiser was smoking pretty much the same weed, but he actually did a lot of real work and implemented some ideas in Reiser4.
  • a universal data store, processor and transport agent
  • distributed
  • peer-to-peer
  • like IP, ubiquitous
  • DBMS-based
  • embedded full-featured versioning
  • copy-on-write with on-demand real-time garbage collection
    • different versions of one piece of data can well be found in different parts of the globe after a COW
  • disk drives and remote storage are handled similarly
    • either reordering is independent of FS, or
    • reordering is embedded in FS, so that it can be done in an efficient manner, considering servers will have many clients
  • like DNS, recursive and non-recursive operation modes for nodes
    • enterprise-wide proxies are possible
  • compression
  • encryption
  • any kind of programmable transformations via hooks at many levels
  • should make many apps obsolete immediately
    • most p2p apps
    • most vcs, at least in part of versioning functionality
    • many RDBMS
    • many other DBMS
    • most DB-based apps can be rewritten with simple unix-style tools
  • ubiquity means almost every piece of storage in every device is part of a global internet-based file system
  • keeping data local just means marking it as such (to be stored locally)
  • backups come with versioning and COW snapshots
  • making remote backups just means marking data (or parts of it, e.g. spans of version trees and/or datetime spans of snapshots) to be replicated remotely
  • p2p-enabled free, secure, ultra-reliable distributed backups
    • e.g. you dedicate 50% of your hard drive for others' data
    • your data is replicated on hard drives of others'
    • privacy can be ensured via strong encryption, like gpg
  • smart caching is vital
  • writing/locking/collaboration management is quite complicated
  • trust management
  • data transfer can be done via existing protocols, like ftp, http/webdav, ssh/scp/sftp
    • efficient connection caching is vital
      • btw, centralization of connection caching/multiplexing on the OS level sounds like a good idea
      • if implemented, most apps can benefit from things like ssh connection multiplexing and smtp postfix/anvil management
  • successful transition requires high-level fuse-like implementations
    • FS-based
    • DB-based
  • transfer speeds cover wide range
    • from dead-slow background backups
    • to real-time high-traffic interactive medical imaging
  • external search engines (if relevant) become find(1) backends
  • adaptation to different load patterns
  • built-in fully-featured scheduling capabilities
    • between all kinds of objects
      • users
      • tasks
      • hosts
      • etc
    • mirroring can be slow by default, but link-speed if prioritized
  • different profiles for different devices
  • diskless operation would not require the logic to deal with local hard drives
  • the simplest profiles should be as simple as FAT16
  • on-disk format should be compatible across all profiles
    • at the very least it should be upwards compatible - from simplest to fullest
  • meta-data is a first-class citizen
    • as flexible as data itself
    • can be aggregated/mirrored separately from data
      • index aggregators =~ search engines
    • can include derivatives of several degrees
      • indexes of indexes to optimize and accelerate search
    • very verbose logs can be kept in metadata as space permits
      • logging everyone who accessed a piece of data
        • can be used later to redirect clients who want the piece
  • flexible, pluggable auth methods
    • built-in support for things like payment
      • imagine iTunes
  • variable block (chunk) size, may be fixed in simpler profiles
  • optional per-file-tunable per-file/per-block checksums
    • optionally delayed
    • fast non-checksummed writes under load
    • fast non-verified reads under load
    • background checksumming when load is low
    • foreground checksumming on request
    • background verification (scrapping) under low load
    • on-demand tunable mandatory read verification

Introducing structure

  • thanks to DB features old situations with lots of small files can be converted to structured DB entities
  • data multiplexing (e.g. multimedia) can be done at file system level

Imagine a perfect workflow

  • you don't work with files and folders
  • you don't download stuff
  • just play /Music/SomeBand/ASong - and the global file system gets the data to you
    • caching it locally
    • it may be cached by your ISP
    • when your neighbors request the same song, they will probably get most of the data from you
  • you don't send stuff
  • just work on /MyCompany/MyDept/CurrentProject
  • edit /Wikipedia/Article
  • cache (=mirror=clone) any open resources for offline read-write access
  • branch, edit, merge any data
  • manage processing resources
    • another topic
  • indexing everywhere
  • instead of going to a website to rate a movie, you create a new attribute with your identity, score, and optional comment/essay
    • one or more imdb-like central entities can then collect these attributes from users and calculate averages, and mirror comments
  • *everything* is one file system

The problem is

  • making everything dead-simple

Is file system limited to data?

  • of course not
  • processes
  • RAM
  • devices
  • any possible objects
  • all globally shareable
  • plan9-style
  • mv /MyLaptop/Processes/Rendering /MyCompany/ServerFarm/Processes/
  • cp /Music/One/Album /MyDesktop/Devices/CDWriter
    • in fact the real syntax would imply mirroring the Album to the CDWriter
  • cat /MyKitchen/Devices/CoffePot/Status
  • vim /People/JohnDoe/PoliceRecord
  • cat /Orbit/Hubble/InfraCam/Raw| * /SomeSuperComputer/Filters/InterpolateInfra| * /AnotherCluster/GenericDataMiner| * /BlackMesa/Visualize| * /Astro/Data/GreenMen51

Links

Google Video

Wikipedia

Categories

Articles

Misc

Topic revision: r8 - 09 Nov 2007 - 13:29:07 - Main.AndrewPantyukhin
 

Cenkes - IT Pro Bono