REGEX — Service discovery using regular expressions
Using the REGEX subsystem, you can discover peers that offer a particular service using regular expressions. The peers that offer a service specify it using a regular expressions. Peers that want to patronize a service search using a string. The REGEX subsystem will then use the DHT to return a set of matching offerers to the patrons.
For the technical details, we have Max’s defense talk and Max’s Master’s thesis.
Note
An additional publication is under preparation and available to team members (in Git).
Todo
Missing links to Max’s talk and Master’s thesis
How to run the regex profiler
The gnunet-regex-profiler can be used to profile the usage of mesh/regex for a given set of regular expressions and strings. Mesh/regex allows you to announce your peer ID under a certain regex and search for peers matching a particular regex using a string. See szengel2012ms for a full introduction.
First of all, the regex profiler uses GNUnet testbed, thus all the implications for testbed also apply to the regex profiler (for example you need password-less ssh login to the machines listed in your hosts file).
Configuration
Moreover, an appropriate configuration file is needed. In the following paragraph the important details are highlighted.
Announcing of the regular expressions is done by the gnunet-daemon-regexprofiler, therefore you have to make sure it is started, by adding it to the START_ON_DEMAND set of ARM:
[regexprofiler]
START_ON_DEMAND = YES
Furthermore you have to specify the location of the binary:
[regexprofiler]
# Location of the gnunet-daemon-regexprofiler binary.
BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
# Regex prefix that will be applied to all regular expressions and
# search string.
REGEX_PREFIX = "GNVPN-0001-PAD"
When running the profiler with a large scale deployment, you probably want to reduce the workload of each peer. Use the following options to do this.
[dht]
# Force network size estimation
FORCE_NSE = 1
[dhtcache]
DATABASE = heap
# Disable RC-file for Bloom filter? (for benchmarking with limited IO
# availability)
DISABLE_BF_RC = YES
# Disable Bloom filter entirely
DISABLE_BF = YES
[nse]
# Minimize proof-of-work CPU consumption by NSE
WORKBITS = 1
Options
To finally run the profiler some options and the input data need to be specified on the command line.
gnunet-regex-profiler -c config-file -d log-file -n num-links \
-p path-compression-length -s search-delay -t matching-timeout \
-a num-search-strings hosts-file policy-dir search-strings-file
Where...
- ... - config-filemeans the configuration file created earlier.
- ... - log-fileis the file where to write statistics output.
- ... - num-linksindicates the number of random links between started peers.
- ... - path-compression-lengthis the maximum path compression length in the DFA.
- ... - search-delaytime to wait between peers finished linking and starting to match strings.
- ... - matching-timeouttimeout after which to cancel the searching.
- ... - num-search-stringsnumber of strings in the search-strings-file.
- ... the - hosts-fileshould contain a list of hosts for the testbed, one per line in the following format:- user@host_ip:port
 
- ... the - policy-diris a folder containing text files containing one or more regular expressions. A peer is started for each file in that folder and the regular expressions in the corresponding file are announced by this peer.
- ... the - search-strings-fileis a text file containing search strings, one in each line.
You can create regular expressions and search strings for every AS in the Internet using the attached scripts. You need one of the CAIDA routeviews prefix2as data files for this. Run
create_regex.py <filename> <output path>
to create the regular expressions and
create_strings.py <input path> <outfile>
to create a search strings file from the previously created regular expressions.