Main Page   Data Structures   File List   Data Fields   Globals   Related Pages  

The libhsync delta-encoding library

Author(s):
Martin Pool <mbp@samba.org>

Id:
main.dox,v 1.7 2001/02/26 13:27:12 mbp Exp

This document is also available in printable form:

Introduction

libhsync is a library for calculating and applying network deltas, with an interface designed to ease integration into diverse network applications. libhsync is being developed as part of the rproxy <http://rproxy.samba.org/> and rsync <http://rsync.samba.org/> projects.

libhsync encapsulates the core algorithms of the rsync protocol, which help with efficient calculation of the differences between two files. The rsync algorithm is different from most differencing algorithms because it does not require the presence of the two files to calculate the delta. Instead, it requires a set of checksums of each block of one file, which together form a signature for that file. Blocks at any in the other file which have the same checksum are likely to be identical, and whatever remains is the difference.

The library does not deal with file metadata or structure, such as filenames, permissions, or directories. To this library, a file is just a stream of bytes. Higher-level tools, such as rsync <http://rsync.samba.org> can deal with such issues in a way appropriate to their users.

The library supports three basic operations:

  1. Generating the signature S of a file A .
  2. Calculating a delta D from S and a new file B.
  3. Applying D to A to reconstruct B.
The library also provides the rdiff network delta tool command-line tool, which makes this functionality available to users and scripting languages.

Programming interface

The public interface to libhsync (hsync.h) has functions in several main areas:

All external symbols have the prefix ``hs_'', or ``HS_'' in the case of preprocessor symbols.

Data streaming

A key design requirement for libhsync is that it should handle data as and when the hosting application requires it. libhsync can be used inside applications that do non-blocking IO or filtering of network streams, because it never does IO directly, or needs to block waiting for data.

The programming interface to libhsync is similar to that of zlib and bzlib. Arbitrary-length input and output buffers are passed to the library by the application, through an instance of hs_stream_t. The library proceeds as far as it can, and returns an hs_result value indicating whether it needs more data or space.

All the state needed by the library to resume processing when more data is available is kept in a small opaque hs_job_t structure. After creation of a job, repeated calls to hs_job_iter() in between filling and emptying the buffers keeps data flowing through the stream. The hs_result_t values returned may indicate

These can be converted to a human-readable string by hs_strerror().

Generating and applying deltas

All encoding operations are performed by using a *_begin function to create a hs_job_t object, passing in any necessary initialization parameters. The various jobs available are:

Processing whole files

Some applications do not require fine-grained control over IO, but rather just want to process a whole file with a single call. libhsync provides `whole-file' functionality to do exactly that.

Processing of a whole file begins with creation of a hs_job_t object for the appropriate operation, just as if the application was going to do buffering itself. After creation, the job may be passed to hs_whole_run(), which will feed it to and from two FILEs as necessary until end of file is reached or the operation completes.

Debugging trace and error logging

libhsync can output trace or log messages as it proceeds. These follow a fairly standard priority-based filtering system (hs_trace_set_level()), using the same severity levels as UNIX syslog. Messages by default are sent to stderr, but may be passed to an application-provided callback (hs_trace_to(), hs_trace_fn_t).

Encoding statistics

Encoding and decoding routines accumulate compression performance statistics in a hs_stats_t structure as they run. These may be converted to human-readable form or written to the log file using hs_format_stats() or hs_log_stats() respectively.

Utility functions

Some additional functions are used internally and also exposed in the API:


Generated at Wed Feb 28 16:01:46 2001 for libhsync by doxygen 1.2.5 written by Dimitri van Heesch, © 1997-2001