Think Artificial Think Artificial / Items

Online storage | Thanks for the memory | Economist.com

Get Feed
Online storage | Thanks for the memory | Economist.com
Description
IF YOU have lots of unused storage space on your hard disk, then why not share it with others on the internet? The benefit could be distributed storage for your own files, making them available any time via the web, even if you are nowhere near your computer—indeed, even if your computer is switched off. That desideratum is what a Zurich-based firm called Caleido is aiming to provide, with a free online storage service known as Wuala that was recently introduced to the public.
Original URL

Comments

  • Public Comments

    • 9 months ago


      Cute idea. And brilliant if implemented correctly.
      Think Artificial
    • 9 months ago


      Amazing that the Kazaa-Skype-Joost guys didn't launch this idea yet. Article is interesting, with good explanation of the approach.
      Think Artificial
    • 9 months ago


      I have some ideas I would like to *add* to their concept. With communities / groups / online families, there is likely to be a significant amount of overlap between content in multiple computers. Shared / copied images and documents (ignoring the already online documents). Providing a 'group' [encryption] key and applying deduplication processing should allow for even less overall storage requirements. If you want to do full system backups, then the overlap will be even higher, since *lots* of people have the same software. To avoid issues of inappropriate file copying / sharing, the software could require that the file signature / fingerprint already exist in *your* database before you can access it. That could even be used to detect programs that have been *modified* by some types of malware. The signature changed. Either that signature can be flagged as malware (once detected by other means), or it could be a new unique signature (for 'polymorphic' malware). Unique signatures in the 'software' (versus document) classification are *extra* suspicious. Build the structure using an overlapping / hierarchical structure with rules / keys that define who can see / share [backup space for] what. Personal / unique; family, friend [pictures / videos]; group [documents]; other group [software source]; system [global software backup].

      Does that sound like it would fit?
      Think Artificial
      • 9 months ago


        That's an interesting point! Imagine if we could deduplicate the world's data. I'd drop them an email, Phil.
        Think Artificial
        • 9 months ago


          email sent, with expansion on previous comment. Email content is included here.

          I encountered a reference to an article about http://wua.la/en/ at
          http://www.twine.com/item/11gc8l2mp-49/online-storage-thanks-for-the-memory-economist-com
          which pointed to:
          Thanks for the memory (Sep 10th 2008)
          http://www.economist.com/science/tm/displayStory.cfm?source=hptextfeature&story_id=12081445

          On the twine entry, I added comment for an idea to extend the capabilities for wua. Another reader suggested I contact you direct with it:

          --- start quote ---
          I have some ideas I would like to *add* to their concept. With communities / groups / online families, there is likely to be a significant amount of overlap between content in multiple computers. Shared / copied images and documents (ignoring the already online documents). Providing a 'group' [encryption] key and applying deduplication processing should allow for even less overall storage requirements. If you want to do full system backups, then the overlap will be even higher, since *lots* of people have the same software. To avoid issues of inappropriate file copying / sharing, the software could require that the file signature / fingerprint already exist in *your* database before you can access it. That could even be used to detect programs that have been *modified* by some types of malware. The signature changed. Either that signature can be flagged as malware (once detected by other means), or it could be a new unique signature (for 'polymorphic' malware). Unique signatures in the 'software' (versus document) classification are *extra* suspicious. Build the structure using an overlapping / hierarchical structure with rules / keys that define who can see / share [backup space for] what. Personal / unique; family, friend [pictures / videos]; group [documents]; other group [software source]; system [global software backup].

          Does that sound like it would fit?
          --- end quote ---

          The basic idea is that in 'social' space, there are many interlinked and overlapping groups that are likely to include a significant amount of duplication in the computer file content. Providing a way to use / create a 'group' key would allow reduction in the amount of storage required for content backup. The initial concept is at the complete file level. I have seen some 'advertising' for commercial backup software that takes it further, down to the block level. In theory, the finer grained (up to a point) the backup 'buckets', the more duplication should be detectable, and the less storage should be needed to provide complete backups.

          Some examples of 'groups' where file content would be expected to overlap to a noticeable degree
          • House hold (multiple computers for a family)
          • immediate family (parents, children sharing family pictures and other documents)
          • extended family (grandparents, cousins, ...)
          • friends
          • social network 'groups' (physical and internet social groups)
          • co-workers

          One of the biggest duplications would be operating system and application software. **assuming** issues of piracy, [inappropriate] copying, privacy, multiple overlapping key management can be solved with the wua application software, this would be a large backup opportunity. Add to that a method of creating a bootable device (cd, usb key, other) that could do a full restore, and there is the potential for near continuous automatic backup of complete computer systems.

          When working with groups, it might also be possible to prioritize the P2P members to distribute the backup content to, to 'prefer' other members of the same group, especially those that actually already have a matching original. Might want to deliberately *avoid* that for the 'single household' case, so that a localized incident (house fire) does not wipe out the backup as well as the originals.

          Much of the advantages of de-duplication could be achieved without the multiple layers of keys. However, that would required that the data *not* be encrypted before it is matched to existing distributed data store. The [unique] hash of the live data would need to be visible for comparison. Using the multiple group keys, the raw data could be encrypted with each key, and the encrypted hash matched to the backed up hash values. Add to that some mix of automatic and manual selection of the group that files belong in, to determine which key to use with a new file (or file version). The general 'application software' key could be used for the detection of some types of malware. Perhaps it would be possible to flag as 'pending backup' unique values that are selected for this key, and only actually backup with that key when there are enough other backup requests for the matching hash value. Could also initially backup these new / unique cases using the basic 'personal' key, then 'move' them to the 'application software' key if / when there are enough duplicates. Convincing virus software vendors to generate keys for the [non-polymorphic] malware could then enhance detection of infected machines.

          Your thoughts on the concept, and the requirements needed to implement it would be appreciated.
          Think Artificial
          • 9 months ago


            Reply from wuala:

            "Thanks for your suggestions. We are already detecting duplicates on file level. In Wuala, the key used to encrypt the content of a file is derived from the hash of the file. So if two users store the same file, it gets encrypted with key, resulting in the same encrypted data. This allows to detect duplicates without being able to decrypt them. The price for this is slightly reduced privacy as it makes it possible to find out if two users have stored files with the same content. That's the necessary tradeoff when one wants to prevent duplicates. However, there is no duplicate recognition on a block level. If you have further questions or comments, please let me know."

            That covers the deduplication plus encryption scenario I mention below.
            Think Artificial
        • 9 months ago


          Since sending that email, I found another wuala reference on twine: http://www.twine.com/item/11hkcjb2r-n/wuala-a-distributed-file-system which links to a video explanation / demonstration. It looks like they are [or are planning to] cover much that I was suggesting. They seem to be handling [file level] [de]duplication, although I am not clear on the mechanism, since they encrypt before upload. Maybe the duplicate handling is only for the public / published files.
          Think Artificial
    Add a Comment
Report This

Twine is about discovering, collecting and sharing the content that interests you. Learn More

Join Twine

Stats

First Posted By

First Comment By

Tags

Community Tags

Who's Interested In This?

Forgot your password?