NSE — Network size estimation

NSE stands for Network Size Estimation. The NSE subsystem provides other subsystems and users with a rough estimate of the number of peers currently participating in the GNUnet overlay. The computed value is not a precise number as producing a precise number in a decentralized, efficient and secure way is impossible. While NSE’s estimate is inherently imprecise, NSE also gives the expected range. For a peer that has been running in a stable network for a while, the real network size will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2 estimate]. We will now give an overview of the algorithm used to calculate the estimate; all of the details can be found in this technical report.

Todo

link to the report.

Motivation

Some subsystems, like DHT, need to know the size of the GNUnet network to optimize some parameters of their own protocol. The decentralized nature of GNUnet makes efficient and securely counting the exact number of peers infeasible. Although there are several decentralized algorithms to count the number of peers in a system, so far there is none to do so securely. Other protocols may allow any malicious peer to manipulate the final result or to take advantage of the system to perform Denial of Service (DoS) attacks against the network. GNUnet’s NSE protocol avoids these drawbacks.

NSE security .. _Security:

Security

The NSE subsystem is designed to be resilient against these attacks. It uses proofs of work to prevent one peer from impersonating a large number of participants, which would otherwise allow an adversary to artificially inflate the estimate. The DoS protection comes from the time-based nature of the protocol: the estimates are calculated periodically and out-of-time traffic is either ignored or stored for later retransmission by benign peers. In particular, peers cannot trigger global network communication at will.

Principle

The algorithm calculates the estimate by finding the globally closest peer ID to a random, time-based value.

The idea is that the closer the ID is to the random value, the more "densely packed" the ID space is, and therefore, more peers are in the network.

Example

Suppose all peers have IDs between 0 and 100 (our ID space), and the random value is 42. If the closest peer has the ID 70 we can imagine that the average "distance" between peers is around 30 and therefore the are around 3 peers in the whole ID space. On the other hand, if the closest peer has the ID 44, we can imagine that the space is rather packed with peers, maybe as much as 50 of them. Naturally, we could have been rather unlucky, and there is only one peer and happens to have the ID 44. Thus, the current estimate is calculated as the average over multiple rounds, and not just a single sample.

Algorithm

Given that example, one can imagine that the job of the subsystem is to efficiently communicate the ID of the closest peer to the target value to all the other peers, who will calculate the estimate from it.

Target value

The target value itself is generated by hashing the current time, rounded down to an agreed value. If the rounding amount is 1h (default) and the time is 12:34:56, the time to hash would be 12:00:00. The process is repeated each rounding amount (in this example would be every hour). Every repetition is called a round.

Timing

The NSE subsystem has some timing control to avoid everybody broadcasting its ID all at one. Once each peer has the target random value, it compares its own ID to the target and calculates the hypothetical size of the network if that peer were to be the closest. Then it compares the hypothetical size with the estimate from the previous rounds. For each value there is an associated point in the period, let’s call it "broadcast time". If its own hypothetical estimate is the same as the previous global estimate, its "broadcast time" will be in the middle of the round. If its bigger it will be earlier and if its smaller (the most likely case) it will be later. This ensures that the peers closest to the target value start broadcasting their ID the first.

Controlled Flooding

When a peer receives a value, first it verifies that it is closer than the closest value it had so far, otherwise it answers the incoming message with a message containing the better value. Then it checks a proof of work that must be included in the incoming message, to ensure that the other peer’s ID is not made up (otherwise a malicious peer could claim to have an ID of exactly the target value every round). Once validated, it compares the broadcast time of the received value with the current time and if it’s not too early, sends the received value to its neighbors. Otherwise it stores the value until the correct broadcast time comes. This prevents unnecessary traffic of sub-optimal values, since a better value can come before the broadcast time, rendering the previous one obsolete and saving the traffic that would have been used to broadcast it to the neighbors.

Calculating the estimate

Once the closest ID has been spread across the network each peer gets the exact distance between this ID and the target value of the round and calculates the estimate with a mathematical formula described in the tech report. The estimate generated with this method for a single round is not very precise. Remember the case of the example, where the only peer is the ID 44 and we happen to generate the target value 42, thinking there are 50 peers in the network. Therefore, the NSE subsystem remembers the last 64 estimates and calculates an average over them, giving a result of which usually has one bit of uncertainty (the real size could be half of the estimate or twice as much). Note that the actual network size is calculated in powers of two of the raw input, thus one bit of uncertainty means a factor of two in the size estimate.

libgnunetnse

The NSE subsystem has the simplest API of all services, with only two calls: GNUNET_NSE_connect and GNUNET_NSE_disconnect.

The connect call gets a callback function as a parameter and this function is called each time the network agrees on an estimate. This usually is once per round, with some exceptions: if the closest peer has a late local clock and starts spreading its ID after everyone else agreed on a value, the callback might be activated twice in a round, the second value being always bigger than the first. The default round time is set to 1 hour.

The disconnect call disconnects from the NSE subsystem and the callback is no longer called with new estimates.

Results

The callback provides two values: the average and the standard deviation of the last 64 rounds. The values provided by the callback function are logarithmic, this means that the real estimate numbers can be obtained by calculating 2 to the power of the given value (2average). From a statistics point of view this means that:

  • 68% of the time the real size is included in the interval [(2average-stddev), 2]

  • 95% of the time the real size is included in the interval [(2average-2*stddev, 2^average+2*stddev]

  • 99.7% of the time the real size is included in the interval [(2average-3*stddev, 2average+3*stddev]

The expected standard variation for 64 rounds in a network of stable size is 0.2. Thus, we can say that normally:

  • 68% of the time the real size is in the range [-13%, +15%]

  • 95% of the time the real size is in the range [-24%, +32%]

  • 99.7% of the time the real size is in the range [-34%, +52%]

As said in the introduction, we can be quite sure that usually the real size is between one third and three times the estimate. This can of course vary with network conditions. Thus, applications may want to also consider the provided standard deviation value, not only the average (in particular, if the standard variation is very high, the average maybe meaningless: the network size is changing rapidly).

Examples

Let’s close with a couple examples.

Average: 10, std dev: 1 Here the estimate would be

2^10 = 1024 peers. (The range in which we can be 95% sure is: [2^8, 2^12] = [256, 4096]. We can be very (>99.7%) sure that the network is not a hundred peers and absolutely sure that it is not a million peers, but somewhere around a thousand.)

Average 22, std dev: 0.2 Here the estimate would be

2^22 = 4 Million peers. (The range in which we can be 99.7% sure is: [2^21.4, 2^22.6] = [2.8M, 6.3M]. We can be sure that the network size is around four million, with absolutely way of it being 1 million.)

To put this in perspective, if someone remembers the LHC Higgs boson results, were announced with "5 sigma" and "6 sigma" certainties. In this case a 5 sigma minimum would be 2 million and a 6 sigma minimum, 1.8 million.

The NSE Client-Service Protocol

As with the API, the client-service protocol is very simple, only has 2 different messages, defined in src/nse/nse.h:

  • GNUNET_MESSAGE_TYPE_NSE_START This message has no parameters and is sent from the client to the service upon connection.

  • GNUNET_MESSAGE_TYPE_NSE_ESTIMATE This message is sent from the service to the client for every new estimate and upon connection. Contains a timestamp for the estimate, the average and the standard deviation for the respective round.

When the GNUNET_NSE_disconnect API call is executed, the client simply disconnects from the service, with no message involved.

NSE Peer-to-Peer Protocol .. _The-NSE-Peer_002dto_002dPeer-Protocol:

The NSE Peer-to-Peer Protocol

GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD The NSE subsystem only has one message in the P2P protocol, the GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD message.

This message key contents are the timestamp to identify the round (differences in system clocks may cause some peers to send messages way too early or way too late, so the timestamp allows other peers to identify such messages easily), the proof of work used to make it difficult to mount a Sybil attack, and the public key, which is used to verify the signature on the message.

Every peer stores a message for the previous, current and next round. The messages for the previous and current round are given to peers that connect to us. The message for the next round is simply stored until our system clock advances to the next round. The message for the current round is what we are flooding the network with right now. At the beginning of each round the peer does the following:

  • calculates its own distance to the target value

  • creates, signs and stores the message for the current round (unless it has a better message in the "next round" slot which came early in the previous round)

  • calculates, based on the stored round message (own or received) when to start flooding it to its neighbors

Upon receiving a message the peer checks the validity of the message (round, proof of work, signature). The next action depends on the contents of the incoming message:

  • if the message is worse than the current stored message, the peer sends the current message back immediately, to stop the other peer from spreading suboptimal results

  • if the message is better than the current stored message, the peer stores the new message and calculates the new target time to start spreading it to its neighbors (excluding the one the message came from)

  • if the message is for the previous round, it is compared to the message stored in the "previous round slot", which may then be updated

  • if the message is for the next round, it is compared to the message stored in the "next round slot", which again may then be updated

Finally, when it comes to send the stored message for the current round to the neighbors there is a random delay added for each neighbor, to avoid traffic spikes and minimize cross-messages.