Harry

A Tool for Measuring String Similarity

Packages

Homebrew Formula

Harry can be installed easily using the package manager Homebrew for OS X. Once the package manager has been setup, run the following two commands:

  $ brew tap homebrew/science
  $ brew install harry

Debian/Ubuntu packages

There are pre-compiled packages of Harry available for users of Debian and Ubuntu Linux. Simply run the following commands:

   $ sudo add-apt-repository ppa:mlsec/harry
   $ sudo apt-get update 
   $ sudo apt-get install harry

Building Harry

Dependencies

The following libraries are required for building Harry from source code. These libraries are available as packages with many operating system distributions, e.g. Debian Linux and Homebrew (see detailed list of dependencies).

   >= OpenMP 2.5          (need to be supported by the C compiler)
   >= zlib-1.2.1          http://www.zlib.net
   >= libconfig-1.3.2     http://www.hyperrealm.com/libconfig/      
   >= libarchive-3.1.2    http://libarchive.github.com/

Compilation

Harry follows the standard compilation procedure of GNU software. It has been successfully compiled on Linux and Mac OS X.

  $ ./configure [options]
  $ make
  $ make check
  $ make install

Configuration options

  --prefix=PATH           Set directory prefix for installation

By default Harry is installed into /usr/local. If you prefer a different location, use this option to select an installation directory.

  --enable-prwlock        Enable support for POSIX read-write locks

This feature enables read-write locks (rwlocks) from the POSIX thread library. The locks can accelerate the run-time performance on multi-core systems. However, these POSIX locks are not guaranteed to interplay with OpenMP and thus may not work on all platforms.

  --enable-md5hash        Enable MD5 as alternative hash

Harry uses a hash function for mapping words to symbols. By default the very efficient Murmur hash is used for this task. In certain critical cases it may be useful to use a cryptographic hash as MD5.