I've been explaining rproxy to anyone who doesn't run away, for three reasons: I want to get feedback and suggestions on the design, I want to encourage people to be ready to beta-test it, and to build support to get it standardized. Some people were familiar with the templating and Mogul delta-http drafts, but nobody knew anything as good as rsync. Explaining rproxy to people has helped me understand the shortcomings of the current documentation. It's a fairly novel idea, and so it takes a while until people get it. (It's been fun watching them gradually go through the process. Everybody comprehends something like this in a different way.) Explaining rproxy in terms of rsync is not very helpful, probably because even most of the rsync users don't completely understand how they work. Here's an explanation that seems to work well: More and more web content is dynamically generated: database queries, portals, user-customized output. Current HTTP caching is all-or-nothing: if the cache isn't good enough to use as is, then it's thrown away and completely replaced. That's a bit inefficient, though, as often the new instance has some content in common with the old one. We'd like to just send a diff or delta from the server down to the client to update the old version to generate the new one. If the server kept all of its resources in CVS or DAV, and knew what version the client had, then it could just generate a diff. But in general the server doesn't have a copy of every old page it's ever sent out: if it was for example generating content from a database then there could be any number of variations, so it can't keep them all. This was Jeff Mogul's proposal, but it's not generally feasible. Another alternative is for the server to ship an unchanging template of the page down to the client, and then to send down the variables that change. I don't think this is very likely to happen: it complicates the client and limits the creativity of server-side developers. Here's what rproxy does: the client keeps a cache on disk as usual, even for dynamically-generated responses. As it does a request, it checks whether it has an old copy of the URL. If it does, it splits that version up into chunks of (say) 1kb, and generates a checksum for each chunk. It sends those checksums up to the server in an additional request header. After the server generates the response body, it searches through it for sections whose checksums match the blocks from the client. If it finds one, then it has found some data in common, and it can send a short instruction telling the client that. The result is a new HTTP encoding that mixes literal data with instructions to copy regions from the old instance. (More details here...) An IBMer on the Apache team is working on other mechanisms for caching dynamic content, but I forget his name for the moment. Roy and the W3 boys advised me to aim to take this to IETF as a standards proposal, as W3C doesn't really deal with protocols (anymore). Roy warned me against the situation of Delta HTTP, where Jeff Mogul proposed the standard and expected other people to implement it: we have to provide a good reference implementation and energetically promote it for it to succeed. The licence will have to be relaxed from the GNU GPL for this to be a useful reference implementation: too many interesting projects can't accept code licensed in that way. The LGPL will not work either; probably the right thing is something like the Apache License or zlib license. (I get the impression many of the Apache people think open protocols are a higher good than open source.) Most people really like rsync. For some people rsync has the reputation of being slow, so they worry that rproxy will be slow. Perhaps I should see if there's anything I can do: it might be good to run rproxy under a profiler and to see if there's a design fix for the Solaris pipe lockup bug.