REGEX — Service discovery using regular expressions

Using the REGEX subsystem, you can discover peers that offer a particular service using regular expressions. The peers that offer a service specify it using a regular expressions. Peers that want to patronize a service search using a string. The REGEX subsystem will then use the DHT to return a set of matching offerers to the patrons.

For the technical details, we have Max’s defense talk and Max’s Master’s thesis.

Note

An additional publication is under preparation and available to team members (in Git).

Todo

Missing links to Max’s talk and Master’s thesis

How to run the regex profiler

The gnunet-regex-profiler can be used to profile the usage of mesh/regex for a given set of regular expressions and strings. Mesh/regex allows you to announce your peer ID under a certain regex and search for peers matching a particular regex using a string. See szengel2012ms for a full introduction.

First of all, the regex profiler uses GNUnet testbed, thus all the implications for testbed also apply to the regex profiler (for example you need password-less ssh login to the machines listed in your hosts file).

Configuration

Moreover, an appropriate configuration file is needed. In the following paragraph the important details are highlighted.

Announcing of the regular expressions is done by the gnunet-daemon-regexprofiler, therefore you have to make sure it is started, by adding it to the START_ON_DEMAND set of ARM:

[regexprofiler]
START_ON_DEMAND = YES

Furthermore you have to specify the location of the binary:

[regexprofiler]
# Location of the gnunet-daemon-regexprofiler binary.
BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
# Regex prefix that will be applied to all regular expressions and
# search string.
REGEX_PREFIX = "GNVPN-0001-PAD"

When running the profiler with a large scale deployment, you probably want to reduce the workload of each peer. Use the following options to do this.

[dht]
# Force network size estimation
FORCE_NSE = 1

[dhtcache]
DATABASE = heap
# Disable RC-file for Bloom filter? (for benchmarking with limited IO
# availability)
DISABLE_BF_RC = YES
# Disable Bloom filter entirely
DISABLE_BF = YES

[nse]
# Minimize proof-of-work CPU consumption by NSE
WORKBITS = 1

Options

To finally run the profiler some options and the input data need to be specified on the command line.

gnunet-regex-profiler -c config-file -d log-file -n num-links \
-p path-compression-length -s search-delay -t matching-timeout \
-a num-search-strings hosts-file policy-dir search-strings-file

Where...

... config-file means the configuration file created earlier.
... log-file is the file where to write statistics output.
... num-links indicates the number of random links between started peers.
... path-compression-length is the maximum path compression length in the DFA.
... search-delay time to wait between peers finished linking and starting to match strings.
... matching-timeout timeout after which to cancel the searching.
... num-search-strings number of strings in the search-strings-file.
... the hosts-file should contain a list of hosts for the testbed, one per line in the following format:
- user@host_ip:port
... the policy-dir is a folder containing text files containing one or more regular expressions. A peer is started for each file in that folder and the regular expressions in the corresponding file are announced by this peer.
... the search-strings-file is a text file containing search strings, one in each line.

You can create regular expressions and search strings for every AS in the Internet using the attached scripts. You need one of the CAIDA routeviews prefix2as data files for this. Run

create_regex.py <filename> <output path>

to create the regular expressions and

create_strings.py <input path> <outfile>

to create a search strings file from the previously created regular expressions.