wiki:WikiStart
Last modified 4 years ago Last modified on 11/02/09 20:17:57

InstantMirror Project Home Page

There exist various existing repository replication methods to mirror repositories, each with their own pros and cons. InstantMirror aims to solve problems of data repository replication in an efficient way. Various modern techniques are utilized and combined like Squid-like reverse proxy cache and cache expiry algorithm, rsync-like local directory trees, Torrent style efficient swarming many-to-many data replication. This project if implemented would dramatically improve the efficiency and timeliness of repository replication. This could be very useful for mirror networks like kernel.org, Fedora, Debian, CPAN, etc.

Major Implications

  1. Instant changes on mirrors. From the perspective of users, changed files on the master can appear for download "instantly" on mirrors.
  2. Read-only network filesystem that can replicates data in a torrent-like swarming manner, with snapshots and tags to preserve access to previous contents of the filesystem. This has the potential to be far more robust and efficient than rsync mirrors, while being more flexible than reverse caching proxy mirrors.

Project Status: Proof of Concept Code

Google Summer of Code 2009 student Atul Aggarwal wrote a proof of concept implementation of this InstantMirror design based upon libtorrent and lighttpd.

Please direct any questions to Atul Aggarwal and instantmirror-list.

InstantMirror Architecture Design

  • InstantMirror Daemon
    • Networking and storage is to be handled by InstantMirror backend daemon hereby referred to as imdaemon. Frontend interface reads from imdaemon and provides a navigable tree of repository. HTTP and fuse are examples of the frontend interfaces.
  • HTTP interface
    • HTTP interface (hereby referred to as imhttpd) is the primary frontend interface which is used for InstantMirror filesystem interface. We will implement a httpd frontend first because it is the most common way that InstantMirror would be used. A native httpd would be capable of performance benefits like zero copy sendfile() that are valuable on extremely busy servers.
  • Fuse Interface
    • The second type of frontend to implement will be fuse filesystem hereby referred to as imfs. fuse mount can serve files by arbitrary methods like NFS, HTTP, FTP or rsync, to create a traditional style repository mirror.
  • Object Caching with Expiration
    • All files that have been read previously from the filesystem interface are stored as cached objects on disk. The configuration of InstantMirror on a particular host can configure the maximum storage allowed to be used by the object cache. This means any amount of storage can be used on a mirror server, even if the available amount of storage is less than the aggregate size of repositories to be mirrored.
    • Objects are stored on disk as files or partial files. The objects are not accessed directly through the filesystem, but through the imhttpd or imfs which maps to a navigable filesystem. Reading files from imhttpd/imfs reads data from already cached objects, or begins download of new objects if it is not already cached.
    • Old or unused data can be expired from the cache to free up space for fresh content.
  • Object Pre-fetching
    • Objects can be grabbed by clients on-demand, or pre-downloaded when they are available based upon configuration or intelligent algorithms. Example: New metadata and RPMS appearing in updates/ or rawhide of particular archs are likely to be grabbed soon, so begin syncing those objects locally even if a client did not ask for them yet.
    • The server could automatically recommended to clients directories/files to pre-seed first.
  • Repository Snapshots, Views, and Tags
    • A tree of directories and files (hereby referred to as repository) to be published is snapshotted at a particular point in time. That snapshot is given a unique UUID, which refers to the contents of that repository at that particular point of time. In these examples, 9c857228-cd5d-49f1-90fc-16ac79883853 is the first snapshot, while 050ef434-0ca7-44da-adac-752fdfba196a is the second snapshot taken a day later with a new nightly rawhide tree.
    • The server-side can store an arbitrary number of snapshots, and expire old snapshots depending on storage needs or configured policy.
    • The client-side node can access the repository through views on a given snapshot UUID. The examples below are of the imfs filesystem view.
      • For example, say the fedora-mirror repository is mounted at /srv/fedora-mirror.
      • /srv/fedora-mirror/snap/9c857228-cd5d-49f1-90fc-16ac79883853 allows you to navigate the first snapshot and read files from it, until the server has expired that snapshot.
      • /srv/fedora-mirror/snap/050ef434-0ca7-44da-adac-752fdfba196a allows you to navigate the second snapshot.
      • /srv/fedora-mirror/snap/current is a symlink pointing at the second snapshot, since it is the latest snapshot.
      • Note: The client-side imfs mount can be on the same machine as the server-side master. This can be a form of filesystem snapshotting for backup, with easy retrieval of old versions of files. Such local mounts can recognize that they are local and thus avoid using a client-side cache.
      • Note: server-side or client-side snapshot preservation tools can be implemented. For example, say 9c857228-cd5d-49f1-90fc-16ac79883853 through testing proves to contain a non-broken version of rawhide that users can install from. A tool could "tag" that snapshot for preservation, protected from the automatic server-side snapshot cleanup. Such server-side tags could be accessed through a pathname like /srv/fedora-mirror/master-snap/rawhide-good-20090317/. Client-side tags could theoretically be preserved even beyond server-side expiry of a snapshot, accessible through a pathname like /srv/fedora-mirror/local-snap/rawhide-good-20090317/. The drawback however of client-side tags is all objects of that repository would need to be downloaded and stored locally, while server-side tags are download on demand.
  • Download Priority Queue
    • This queue contains a prioritized task list of which files to download, in what order, to supply the object cache. Existing algorithms in Squid might be a good template for this component.
    • When a file is requested but not yet in the object cache, a method is run to add that file as a high priority to the download priority queue.
    • If high priority on-demand objects are absent, the download priority queue could proceed with lower priority pre-seeding downloads. Choosing which objects to pre-seed could be done by either configuration file, suggested by the server, or later intelligent behavior recognizing algorithms (e.g. which directory seems most popular.)
  • Delayed Snapshot Appearance
    • While InstantMirror allows new snapshot contents from the master to appear "instantly" to users on mirrors, it might not be desirable to do so because it would be a burst of traffic as thousands of end-user clients cause simultaneous on-demand accesses to the master for new content. For this reason, it may be desirable to implement Delayed Snapshot Appearance, where the master makes available a new snapshot for nodes to begin pre-seeding, and only later after a defined time-out current is pointed at this new snapshot.
  • BitTorrent Protocol Reuse
    • The BitTorrent protocol, existing tools, and libraries already seem to implement most of what we need.
    • Each snapshot can be an individual .torrent, which contains SHA1 checksums of chunks of files.
    • Torrent clients can already be told to Download this First or Download only these Particular Files during runtime of the torrent.
  • Configuration File Parser
    • Various options for both the server-side and client-side need to be read from a configuration file. We may be able to re-use existing config file language parsers.
      • Client-side Node
        • Caching
          • Cachedir, maximum size, snapshot expiration policy, etc.
          • Which objects to pre-seed first?
          • Which objects never to pre-seed?
          • Which objects never to cache?
        • Peers
          • Preferred peers (geographically closest, cheapest bandwidth, etc.)
          • Peer whitelist or blacklist
      • Server-side Master
        • Paths and Storage
          • Repository source paths
          • Snapshot object and metadata storage path
          • Maximum sizes, snapshot expiration policy, etc.
        • Peers
          • Preferred peers (geographically closest, cheapest bandwidth, etc.)
          • Peer whitelist or blacklist
  • OPTIONAL: Statistics Gathering Opportunities
    • Since the master tracker knows the sync state of client nodes, this information can be used for several novel purposes.
      • Know which mirrors are usable in real-time for dynamic yum mirrorlists.
      • Know progress of a big new sync, like "80% of Fedora 11 is synced to mirrors, good enough, lift embargo now!"
    • Client nodes could possibly count the number of filesystem accesses to individual files. This could be reported back to the master tracker. This could be used for other novel purposes.
      • For the first time, we have a way to measure the relative popularity of packages.
  • ADVANCED: Intelligent Node Peering
    • GeoIP could detect that several nodes are within the same geographic region. The intelligent tracker could suggest these specific nodes to peer with each other for more efficient swarming.
    • Internet2 nodes could be peered preferentially, to sync themselves as quickly as possible, before they help to seed the rest of the world.
  • Performance Issues
    • According to warthog9, sendfile() is a very good idea for high traffic mirrors because it avoids memcpy() and avoids unnecessary wiping out memory cache. As fuse frontend cannot take advantage of that, but fuse frontend has other benefits, thus http will be implemented first.

Client-Side Node Diagram (needs some revision)

http://wtogami.fedorapeople.org/archive/2009/client-side2.jpg

Server-Side Node Diagram

http://wtogami.fedorapeople.org/archive/2009/server-side.jpg

Theoretical Roadmap

  • Phase 1: Design
    • Figure out exactly what we want as design goals, and map those goals to the roadmap in phases.
  • Phase 2: Prototyping
    • Prototype these concepts first using existing libraries (librsync, libfuse, torrent as a library?, etc.) Various other parts like using torrent-as-a-library will likely have multiple attempts of an implementation as we figure out the best method to use in the real project. Several early prototypes will be thrown out.
    • Some parts like cache object storage and expiration algorithms could be copied from projects like Squid.
    • Figure out if these goals can be reached without writing these component parts or algorithms from scratch.
    • Figure out which programming language to implement the server-side. It seems the client-side must be implemented in C due to libfuse.
  • Phase 3: Backend storage management with caching
    • Intially implement storage management without caching.
    • Include squid caching and expiration algorithms to support caching.
  • Phase 4: Implement the Configuration File Parser
  • Phase 5: HTTP frontend interface (imhttpd)
    • Create a high performance http server that serves only mirror objects.
  • Phase 6: Implement the Download Priority Queue
  • Phase 7: Integrate bandwidth optimization of Torrent-like object replication.
    • Augment the direct server-to-client download method with optional torrent-like data replication between nodes.
  • Phase 8: Fuse frontend interface (imfs)
  • Phase 9: Implement Statistics Gathering Service

Questions to Discuss

  • Perhaps the metadata and on-disk format of snapshots should be identical between the client and server?
  • Should snapshots made with rsync with hardlinks, followed by a metadata creation pass?

Communicate

Mentors

  • Warren Togami: Most of the design is from years of thinking about problems involved in running and maintaining a large FOSS repository mirror, like a kernel.org or Fedora mirror. See the existing repository replication methods page for notes.
  • John 'Warthog9' Hawley: Kernel.org mirror network maintainer

Similar Projects