HOSTLIST — HELLO bootstrapping and gossip
Peers in the GNUnet overlay network need address information so that they can connect with other peers. GNUnet uses so called HELLO messages to store and exchange peer addresses. GNUnet provides several methods for peers to obtain this information:
out-of-band exchange of HELLO messages (manually, using for example gnunet-peerinfo)
HELLO messages shipped with GNUnet (automatic with distribution)
UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast)
topology gossiping (learning from other peers we already connected to), and
the HOSTLIST daemon covered in this section, which is particularly relevant for bootstrapping new peers.
New peers have no existing connections (and thus cannot learn from gossip among peers), may not have other peers in their LAN and might be started with an outdated set of HELLO messages from the distribution. In this case, getting new peers to connect to the network requires either manual effort or the use of a HOSTLIST to obtain HELLOs.
The basic information peers require to connect to other peers are contained in so called HELLO messages you can think of as a business card. Besides the identity of the peer (based on the cryptographic public key) a HELLO message may contain address information that specifies ways to contact a peer. By obtaining HELLO messages, a peer can learn how to contact other peers.
Overview for the HOSTLIST subsystem
The HOSTLIST subsystem provides a way to distribute and obtain contact
information to connect to other peers using a simple HTTP GET request.
Its implementation is split in three parts, the main file for the
daemon itself (
gnunet-daemon-hostlist.c), the HTTP client used to
download peer information (
hostlist-client.c) and the server
component used to provide this information to other peers
hostlist-server.c). The server is basically a small HTTP web server
(based on GNU libmicrohttpd) which provides a list of HELLOs known to
the local peer for download. The client component is basically a HTTP
client (based on libcurl) which can download hostlists from one or more
websites. The hostlist format is a binary blob containing a sequence of
HELLO messages. Note that any HTTP server can theoretically serve a
hostlist, the built-in hostlist server makes it simply convenient to
offer this service.
The HOSTLIST daemon can:
provide HELLO messages with validated addresses obtained from PEERINFO to download for other peers
download HELLO messages and forward these message to the TRANSPORT subsystem for validation
advertises the URL of this peer’s hostlist address to other peers via gossip
automatically learn about hostlist servers from the gossip of other peers
HOSTLIST - Limitations
The HOSTLIST daemon does not:
verify the cryptographic information in the HELLO messages
verify the address information in the HELLO messages
Interacting with the HOSTLIST daemon
The HOSTLIST subsystem is currently implemented as a daemon, so there is no need for the user to interact with it and therefore there is no command line tool and no API to communicate with the daemon. In the future, we can envision changing this to allow users to manually trigger the download of a hostlist.
Since there is no command line interface to interact with HOSTLIST, the only way to interact with the hostlist is to use STATISTICS to obtain or modify information about the status of HOSTLIST:
$ gnunet-statistics -s hostlist
In particular, HOSTLIST includes a persistent value in statistics that specifies when the hostlist server might be queried next. As this value is exponentially increasing during runtime, developers may want to reset or manually adjust it. Note that HOSTLIST (but not STATISTICS) needs to be shutdown if changes to this value are to have any effect on the daemon (as HOSTLIST does not monitor STATISTICS for changes to the download frequency).
Hostlist security address validation
Since information obtained from other parties cannot be trusted without validation, we have to distinguish between validated and not validated addresses. Before using (and so trusting) information from other parties, this information has to be double-checked (validated). Address validation is not done by HOSTLIST but by the TRANSPORT service.
The HOSTLIST component is functionally located between the PEERINFO and the TRANSPORT subsystem. When acting as a server, the daemon obtains valid (validated) peer information (HELLO messages) from the PEERINFO service and provides it to other peers. When acting as a client, it contacts the HOSTLIST servers specified in the configuration, downloads the (unvalidated) list of HELLO messages and forwards these information to the TRANSPORT server to validate the addresses.
The HOSTLIST daemon
The hostlist daemon is the main component of the HOSTLIST subsystem. It is started by the ARM service and (if configured) starts the HOSTLIST client and server components.
If the daemon provides a hostlist itself it can advertise it’s own
hostlist to other peers. To do so it sends a
GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT message to other peers
when they connect to this peer on the CORE level. This hostlist
advertisement message contains the URL to access the HOSTLIST HTTP
server of the sender. The daemon may also subscribe to this type of
message from CORE service, and then forward these kind of message to the
HOSTLIST client. The client then uses all available URLs to download
peer information when necessary.
When starting, the HOSTLIST daemon first connects to the CORE subsystem and if hostlist learning is enabled, registers a CORE handler to receive this kind of messages. Next it starts (if configured) the client and server. It passes pointers to CORE connect and disconnect and receive handlers where the client and server store their functions, so the daemon can notify them about CORE events.
To clean up on shutdown, the daemon has a cleaning task, shutting down all subsystems and disconnecting from CORE.
The HOSTLIST server
The server provides a way for other peers to obtain HELLOs. Basically it is a small web server other peers can connect to and download a list of HELLOs using standard HTTP; it may also advertise the URL of the hostlist to other peers connecting on CORE level.
The HTTP Server
During startup, the server starts a web server listening on the port specified with the HTTPPORT value (default 8080). In addition it connects to the PEERINFO service to obtain peer information. The HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request HELLO information for all peers and adds their information to a new hostlist if they are suitable (expired addresses and HELLOs without addresses are both not suitable) and the maximum size for a hostlist is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes (with a last NULL callback), the server destroys the previous hostlist response available for download on the web server and replaces it with the updated hostlist. The hostlist format is basically a sequence of HELLO messages (as obtained from PEERINFO) without any special tokenization. Since each HELLO message contains a size field, the response can easily be split into separate HELLO messages by the client.
A HOSTLIST client connecting to the HOSTLIST server will receive the
hostlist as an HTTP response and the server will terminate the
connection with the result code
HTTP 200 OK. The connection will be
closed immediately if no hostlist is available.
Advertising the URL
The server also advertises the URL to download the hostlist to other
peers if hostlist advertisement is enabled. When a new peer connects and
has hostlist learning enabled, the server sends a
GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT message to this peer
using the CORE service.
HOSTLIST client .. _The-HOSTLIST-client:
The HOSTLIST client
The client provides the functionality to download the list of HELLOs from a set of URLs. It performs a standard HTTP request to the URLs configured and learned from advertisement messages received from other peers. When a HELLO is downloaded, the HOSTLIST client forwards the HELLO to the TRANSPORT service for validation.
The client supports two modes of operation:
download of HELLOs (bootstrapping)
learning of URLs
For bootstrapping, it schedules a task to download the hostlist from the set of known URLs. The downloads are only performed if the number of current connections is smaller than a minimum number of connections (at the moment 4). The interval between downloads increases exponentially; however, the exponential growth is limited if it becomes longer than an hour. At that point, the frequency growth is capped at (#number of connections * 1h).
Once the decision has been taken to download HELLOs, the daemon chooses a random URL from the list of known URLs. URLs can be configured in the configuration or be learned from advertisement messages. The client uses a HTTP client library (libcurl) to initiate the download using the libcurl multi interface. Libcurl passes the data to the callback_download function which stores the data in a buffer if space is available and the maximum size for a hostlist download is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded, the HOSTLIST client offers this HELLO message to the TRANSPORT service for validation. When the download is finished or failed, statistical information about the quality of this URL is updated.
The client also manages hostlist advertisements from other peers. The
HOSTLIST daemon forwards
messages to the client subsystem, which extracts the URL from the
message. Next, a test of the newly obtained URL is performed by
triggering a download from the new URL. If the URL works correctly, it
is added to the list of working URLs.
The size of the list of URLs is restricted, so if an additional server is added and the list is full, the URL with the worst quality ranking (determined through successful downloads and number of HELLOs e.g.) is discarded. During shutdown the list of URLs is saved to a file for persistence and loaded on startup. URLs from the configuration file are never discarded.
To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES section for the ARM services. This is done in the default configuration.
For more information on how to configure the HOSTLIST subsystem see the installation handbook: Configuring the hostlist to bootstrap Configuring your peer to provide a hostlist