.. index::
   double: subsystem; REGEX

.. _REGEX-Subsystem:

REGEX — Service discovery using regular expressions
===================================================

Using the REGEX subsystem, you can discover peers that offer a
particular service using regular expressions. The peers that offer a
service specify it using a regular expressions. Peers that want to
patronize a service search using a string. The REGEX subsystem will then
use the DHT to return a set of matching offerers to the patrons.

For the technical details, we have Max's defense talk and Max's Master's
thesis.

.. note:: An additional publication is under preparation and available
   to team members (in Git).

.. todo:: Missing links to Max's talk and Master's thesis

.. _How-to-run-the-regex-profiler:

How to run the regex profiler
-----------------------------

The gnunet-regex-profiler can be used to profile the usage of mesh/regex
for a given set of regular expressions and strings. Mesh/regex allows
you to announce your peer ID under a certain regex and search for peers
matching a particular regex using a string. See
`szengel2012ms <https://bib.gnunet.org/full/date.html#2012_5f2>`__ for a
full introduction.

First of all, the regex profiler uses GNUnet testbed, thus all the
implications for testbed also apply to the regex profiler (for example
you need password-less ssh login to the machines listed in your hosts
file).

**Configuration**

Moreover, an appropriate configuration file is needed. In the following
paragraph the important details are highlighted.

Announcing of the regular expressions is done by the
gnunet-daemon-regexprofiler, therefore you have to make sure it is
started, by adding it to the START_ON_DEMAND set of ARM:

::

   [regexprofiler]
   START_ON_DEMAND = YES

Furthermore you have to specify the location of the binary:

::

   [regexprofiler]
   # Location of the gnunet-daemon-regexprofiler binary.
   BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
   # Regex prefix that will be applied to all regular expressions and
   # search string.
   REGEX_PREFIX = "GNVPN-0001-PAD"

When running the profiler with a large scale deployment, you probably
want to reduce the workload of each peer. Use the following options to
do this.

::

   [dht]
   # Force network size estimation
   FORCE_NSE = 1

   [dhtcache]
   DATABASE = heap
   # Disable RC-file for Bloom filter? (for benchmarking with limited IO
   # availability)
   DISABLE_BF_RC = YES
   # Disable Bloom filter entirely
   DISABLE_BF = YES

   [nse]
   # Minimize proof-of-work CPU consumption by NSE
   WORKBITS = 1

**Options**

To finally run the profiler some options and the input data need to be
specified on the command line.

::

   gnunet-regex-profiler -c config-file -d log-file -n num-links \
   -p path-compression-length -s search-delay -t matching-timeout \
   -a num-search-strings hosts-file policy-dir search-strings-file

Where\...

-  \... ``config-file`` means the configuration file created earlier.

-  \... ``log-file`` is the file where to write statistics output.

-  \... ``num-links`` indicates the number of random links between
   started peers.

-  \... ``path-compression-length`` is the maximum path compression
   length in the DFA.

-  \... ``search-delay`` time to wait between peers finished linking and
   starting to match strings.

-  \... ``matching-timeout`` timeout after which to cancel the
   searching.

-  \... ``num-search-strings`` number of strings in the
   search-strings-file.

-  \... the ``hosts-file`` should contain a list of hosts for the
   testbed, one per line in the following format:

   -  ``user@host_ip:port``

-  \... the ``policy-dir`` is a folder containing text files containing
   one or more regular expressions. A peer is started for each file in
   that folder and the regular expressions in the corresponding file are
   announced by this peer.

-  \... the ``search-strings-file`` is a text file containing search
   strings, one in each line.

You can create regular expressions and search strings for every AS in
the Internet using the attached scripts. You need one of the `CAIDA
routeviews
prefix2as <http://data.caida.org/datasets/routing/routeviews-prefix2as/>`__
data files for this. Run

::

   create_regex.py <filename> <output path>

to create the regular expressions and

::

   create_strings.py <input path> <outfile>

to create a search strings file from the previously created regular
expressions.