CONG
CONG (COre Next Generation) is the name of the project redesigns the CORE-Subsystem-Dev service. Here we document the design decisions and parts that are changing. The most notable change concerns peer ids: In order to avoid location tracking, they are being made non-permanent - they change with each change of underlying addresses. Next to that, the cryptographic primitives in use change, and the interface to the (transport) layers below. The interface to the underlying layers is made more generic so that libp2p can be switched in for gnunet’s own transport (layer 2 overlay/L2O). Finally, protocol-versioning above core will be introduced.
Key exchange
While we are at it we may as well improve the key exchange (The-CORE-Peer_002dto_002dPeer-Protocol). Currently, we are using our own ECDHE key exchange that derives 2x2 keys. 2 keys for each direction (sending/receiving). Each direction uses two 256-bit symmetric encryption keys derived through the ECDH exchange. Each payload is encrypted using AES(kA, Twofish(kB, payload)) both in CFB mode (!).
Next Steps
For CONG, we should double-check the security of your ECDHE construction. We decided on 11/03/2024 to investigate XChaCha20-Poly1305:
Proposal:
Use X25519 for the KX with ephemeral Curve25519 keys.
Use XChaCha20-Poly1305 and kTx,kRx := KDF(X25519(),senderPK,receiverPK) for symmetric encryption
We will have to replace the use of GNUNET_CRYPTO_symmetric_encrypt
and
HMAC use in gnunet-service-core_kx.c
including the respective keys and IVs.
Handshake Protocol (Current)
Warning
This is incomplete as the protocol is very messy and has around 6 RTTs
We assume that the peers have semi-static (as opposed to ephemeral) key pairs. Let (pkA,skA) be the key pair of peer PIDA and (pkB,skB) the key pair of peer PIDB.
For any secure handshake protocol, we have to dermine an initiator and a receiver in the protocol. We use GNUNET_CRYPTO_hash_cmp to determine which peer is the receiver R and which peer the initiator I:
if (GNUNET_CRYPTO_hash_cmp (pk_A, pk_B))
{
pk_I = pk_A
pk_R = pk_B
}
else
{
pk_I = pk_B
pk_R = pk_A
}
It is possible that the designated initiator does not initiate the handshake. After a pre-determined timeout, the respective other peer may initiate.
We assume that the initiator knows pkR (pre-distributed through HELLO, for example).
I
and R
calculate before any connection attempt is made:
(pke,ske) <- KeyGen()
Danger
Yes, both peers calculate ephemeral keys that are used for a set period of time in all handshakes.
I
calculates:
EphemeralKeyMessage
<- (pkI, pke, creation_time, …)sige <- Sign(skI,
EphemeralKeyMessage
)
I
sends to R
EphemeralKeyMessage
, sige
R
calculates:
assert Verify(pkR,
EphemeralKeyMessage
, sige)Establish session keys through ECDH with ephemeral keys.
EphemeralKeyMessage
<- (pkR, pke, creation_time, …)sige <- Sign(skR,
EphemeralKeyMessage
)
R
sends to I
EphemeralKeyMessage
, sige
I
calculates:
assert Verify(pkR,
EphemeralKeyMessage
, sige)Establish session keys through ECDH with ephemeral keys.
I
sends to R
PingMessage
R
calculates:
Pong message
R
sends to I
PongMessage
Handshake Protocol (Draft)
The initiator selection remains unchanged from the above protocol.
I
calculates:
(pke,ske) <- KeyGen()
(ssR,cR) <- Encaps(pkR)
ES <- HKDF-Extract(ssR, 0)
InitiatorHello
<- (pke, cR, H(pkR), rI, [SupportedAlgs,Services,Version])ETS <- HKDF-Expand(ES,
"early data"
,InitiatorHello
)InitiatorCert
<- Enc(ETS, cert [pkI])
I
sends to R
InitiatorHello
, InitiatorCert
Note
We may encode the typemap and capabilities/supported class in InitiatorCert
’s cert. We do not want to use X.509 here, probably.
R
calculates:
Decrypt
InitiatorCert
:(ssR) <- Decaps(skR, cR)
ES <- HKDF-Extract(ssR, 0)
ETS <- HKDF-Expand(ES, “early data”,
InitiatorHello
)cert [pkI] <- Dec(ETS,
InitiatorCert
)
Setup Master Secret:
dES <- HKDF-Expand(ES, “derived”,
NULL
)(sse,ce) <- Encaps(pke)
HS <- HKDF-Extract(sse, dES)
dHS <- HKDF-Expand(HS, “derived”,
NULL
)(ssI,cI) <-
Encaps
(pkI)MS <- HKDF-Extract(ssI, dHS)
Derive Handshake Traffic Encryption Keys:
ReceiverHello
<- (ce, rR, [SelectedAlgs])IHTS <- HKDF-Expand(HS, “i hs traffic”,
InitiatorHello...ReceiverHello
)RHTS <- HKDF-Expand(HS, “r hs traffic”,
InitiatorHello...ReceiverHello
)ReceiverKemCiphertext
<- Enc(RHTS, cI)(Optional)
EncryptedExtensions
<- Enc(RHTS, SupportedAlgs/Services?)
Build
ReceiverFinished
message:fkR <- HKDF-Expand(MS, “r finished”,
NULL
)RF
<- HMAC(fkR,InitiatorHello...ReceiverKemCiphertext
)ReceiverFinished
<- Enc(RHTS,RF
) (TLS1.3-style explicit authentication of receiver after 1RTT!)
Derive Application Traffic Encryption Key:
RATS <- HKDF-Expand(MS, “r ap traffic”,
InitiatorHello...ReceiverFinished
)
R
sends to I
(0.5 RTT):
ReceiverHello
, EncryptedExtensions
, ReceiverKemCiphertext
, ReceiverFinished
and optionally application payload encrypted using RATS.
I
computes:
Setup Master Secret:
(sse) <- Decaps(ske, ce)
dES <- HKDF-Expand(ES,
"derived"
,NULL
)HS <- HKDF-Extract(sse, dES)
dHS <- HKDF-Expand(HS, “derived”,
NULL
)(ssI) <- Decaps(skI, cI)
MS <- HKDF-Extract(ssI, dHS)
Derive Traffic Encryption Keys:
IHTS <- HKDF-Expand(HS, “i hs traffic”,
InitiatorHello...ReceiverHello
)RHTS <- HKDF-Expand(HS, “r hs traffic”,
InitiatorHello...ReceiverHello
)IATS <- HKDF-Expand(MS, “i ap traffic”,
InitiatorHello...InitiatorFinished
)RATS <- HKDF-Expand(MS, “r ap traffic”,
InitiatorHello...ReceiverFinished
)
Build
ReceiverFinished
andInitiatorFinished
plain texts:fkI <- HKDF-Expand(MS, “i finished”, NULL)
IF
<- HMAC(fkI,InitiatorHello...ReceiverFinished
)fkR <- HKDF-Expand(MS, “r finished”, NULL)
RF
<- Dec(RHTS,ReceiverFinished
)assert HMAC(fkR,
InitiatorHello...ReceiverKemCiphertext
) ==RF
InitiatorFinished
<- Enc(IHTS,IF
)
I
sends to R
(1.5 RTT):
InitiatorFinished
and optionally application payload encrypted using IATS. I
can now decrypt received payload using RATS.
R
computes:
IF
<- Dec(IHTS,InitiatorFinished
)fkI <- HKDF-Expand(MS, “i finished”,
NULL
)assert HMAC(fkI,
InitiatorHello...ReceiverFinished
) ==IF
IATS <- HKDF-Expand(MS, “i ap traffic”,
InitiatorHello...InitiatorFinished
)
Glossary:
IATS: Initiator Application Traffic Secret Key
RATS: Receiver Application Traffic Secret Key
dES: Derived Early Secret Key
dHS: Derived Handshake Key
ES: Early Secret Key
ETS: Early Traffic Secret Key
HS: Handshake Secret Key
MS: Main Secret Key
ES: Early Secret Key
IHTS: Initiator Handshake Secret Key
RHTS: Receiver Handshake Secret Key
Foo...Bar
means the transcript of received/send messages fromFoo
untilBar
Unified Address Format for L2O and libp2p
As a unified address format for L2O and libp2p we will use a concatenation of the string representations of gnunet’s hellos and libp2p’s multiaddress, separated by ;;. For example: gnunet://hello/XXPIDXX/XXPIDXX/1725622944?udp=%5B%3A%3A1%5D%3A2086&;;libp2p:///ip4/127.0.0.1/tcp/24915
This is only for the time being. For the long run the integration within each other’s addressing schemes should be evaluated. Meaning: Integrate a gnunet-hello address type in libp2p’s multiaddress format and integrate the multiaddress format with the gnunet-hello.
Peer IDs
Peer ids stop to be unique for the lifetime of a peer, but change each time a peer’s addresses change. This includes gaining or losing an address.
It is important to note that this design choice only increases the cost of network location tracking and does not fully prevent it. For this feature onion routing on top of CADET is envisioned.
At this point it seems like one has to weigh privacy versus performance when it comes to this design decision.
Reasoning
This change was introduced in order to stop tracking of more mobile peers across the network. For example a more mobile peer (laptop) that logs into the network at different places can be easily tracked by everyone just by recording the different addresses that are tied to that peer id over time.
Attacker Model
An attacker observes the hellos (containing ip and other addresses) published under a peer id and is thus capable of tracking locations and thus obtaining a movement profile of this peer.
With the proposed changes to the peer id an attacker can only see a peer id and its connected set of addresses. A movement profile can not be obtained in the previous way.
Tracking of addresses/locations might still be possible in the scenario that a mobile client uses mobile broadband and wifi uplinks and uses them in an ‘overlapping’ manner. (Switching on mobile broadband before leaving the range of a wifi hotspot and vice versa Turning off mobile broadband after connecting to a wifi.) For the ‘overlapping’ time the peer would publish a HELLO containing the old and the new network addresses. After the overlapping time, the peer’s HELLO would just contain the new network address, which was already in the HELLO from during the ‘overlap’. That way the overlap can be used to link the old and new address and - in extreme scenarios - obtain the full movement profile again. Note that this does not work on all, not all the time and requires work for the correlation. It should be highlighted that, although this attack is possible, the new design still greatly reduces the attack surface.
A way to circumvent this anti-tracking mechanism would be for an attacker to exploit the means for consistently connect to the same peer. .. TODO wording/conceptualisation - overlap with next sentence For example with the means of higher-layer services like gns. With the knowledge of the gns entry, its peer ids and thus its addresses can be fully tracked. Until the existence of an onion-routing service and its decoupling of identities and network addresses, this behavior is probably intended and maybe the only way to connect endpoints. Although this leak is still existent the new design still prevents a lot of other leakage. Also it is important to point out that it is impossible to prevent leakage at higher layers, but necessary to prevent leakage at this layer, because leakage at this level could not be fixed at higher layers.
Implications
Here we present the implications for different parts of the framework. This should help get a grasp of the implications of the change. .. TODO poor phrasing?
DHT
The DHT uses the peer id such that it determines which buckets a peer is responsible for. So each time the peer id changes, the peer becomes responsible for different data. This means that each time a peer changes address, it leaves and re-enters the DHT, changes its and other peer’s neighbors and changes the stored data.
Scope of Peer ID for higher Layers
When peer ids stop to be unique over time, the framework is in lack of a globally unique identifier. Higher layers may rely (have relied) on the uniqueness of the ids. This means gnunet has to use other means for this purpose. The Reconnects section below is concerned with the specific impact on reconnects for different higher-layer services. In general gns/identity offers this functionality.
As peer ids cease to be unique over time, this might be a good point to review the scope of its and other elements’ usage and terminology. (See Open_Design_Questions)
Reconnects
When addresses (and with those the peer id) of a peer change, all core connections need to be torn down and with them all higher-layer connections. This affects the layers above CADET as follows:
Revocation: Is not really affected as it is only connected to direct CADET neighbors and makes no use of CADET’s routing, only of its flow and congestion control.
File sharing: Only the non-anonymous filesharing uses CADET connections. This is not significantly affected by a reconnect as it only looks up peer and port in the DHT, so in the meanwhile it’s looking for other peers. TODO this is very unclear to me!
Messenger: All CADET connections would break and the peer might assume that all previously connected peers went offline. So it would require a mechanism to reconnect to peers with a known peer identity which offer routing capability (open port via CADET to connect to). In case the peer itself is providing such capability, it would help to know about peer ID changes ahead of time to communicate a switch between IDs to other peers. For other reconnections via GNS lookups are required.
Conversation: The call would be interrupted until the new peer id of the other has been found (potentially via GNS).
Open question: is gns needed for a reconnect? Could the peer with the new id not simply ‘call back’ the other peer? Of course this would only work if only one peer changes its peer id. If both change their peer id at the same time, an external mechanism would be needed.
See Open_Design_Questions for thoughts on good designs to handle address changes more smoothly.
Messenger
The current implementation of messenger heavily relies on a globally unique peer id. The change requires messenger to account for peer id changes.
Details on how
Peer ids will be generated from the set of a peer’s current addresses. Once a peer obtains the same set of addresses it shall be using the same peer id. To achieve this behavior, the current string representation of addresses is sorted, the sorted representation hashed and the peer id generated from the hash. We call the hash ‘network generation hash’. The hash can be used to quickly identify and recognise a set of addresses that was used in the past.
Once a peer id changes, all communication via open channels shall immediately cease. To signal this, the mq is to be used. To identify all queued messages that are to be cancelled, the ‘network generation id’ is used. (This was decided to be implemented as a simple counter of 64 bit.) With each address and peer id change, the network generation id is incremented. When enqueueing a message with the mq, the current generation id is stored with is. This way, when the generation id changes, all enqueued messages with the old id (which still might refer to the old peer id in some way) can be identified and dequeued. It is important to manage the network generation id as close to the communicators as possible to be able to stop the actual outgoing messages as quick as possible.
To be more precise, the communication via open channels should not cease when the new peer id is available, but already when there’s a change to the addresses in use. This means there is a small window, until the new peer id is generated, in which the peer is without peer id and thus without means to communicate. This might open a new attack window by trying to change the addresses of a peer via for example opening and closing wifi hotspots and sending out new addresses via dhcp.
Open questions
Ownership of Peer IDs
When the peer id was static, all parts of gnunet had a simple way to interface with it. Once it becomes dynamic, it makes a lot of sense that a single part takes control/responsibility/ownership for it. A new service is created for this purpose. A name suggestion was “peer id lifecycle service” - PILS.
The reasons for this new service:
Good encapsulation (which is even more important as it is a component deals with crucial cryptographic operation).
Avoidance of circular subscriptions of core and transport.
Avoidance of callback api hell between core and transport.
Using peerstore for this would be really messy.
- The reasons against this new service:
Having yet another service.
Core will be the responsible service to provide addresses to pils for the switching of peer ids.
Peer ids should not be stored in permanent memory (on disk), but kept in working memory. They will be (re-) computed from the new set of addresses. The advantage is that it is not possible to prove that a peer id belonged to a peer by examining its hard-drive. Keeping the respective HELLO in the peerstore is fine as a peer also keeps other peer’s hello in there and the existence of a hello in the peerstore proves nothing.
Implications
Pils needs to take ownership. It is responsible for generating, changing of peer ids, informing subscribers of changes and signing with the current peer id. (To be gentle on the ipc, it should not sign big amounts of data - if applicable rather hashes of data or such.)
Transport
Up until now transport still creates, signs and puts hellos into the peerstore. When pils takes ownership of peer ids, transport will need to ask it for signing the hellos before putting it into the peerstore. (With libp2p as alternative/additional underlay in mind, everything related to hellos needs to move to core eventually as libp2p is oblivious about that part of gnunet. Core will have to create and provide the hellos. See libp2p-Underlay below.)
Other
Generally, all parts that so far read the peer id from a file into local memory, will need to ask pils for it. All services that sign/encrypt something with the peer id now will need to ask pils to do it.
Details on how
In order to provide the needed functionality, pils needs to expose it via its api:
a call to provide the current set of addresses, so the generation of the peer id can be triggered.
a call to obtain the current peer id
registering a handler that informs about new peer ids
signing data with the peer id
libp2p Underlay
Get gnunet to work on libp2p. This includes the FFI with rust (or binding to implementations in other languages), converting address formats, signaling metadata (traffic class and priority, as far as libp2p supports it) and signalling connectivity changes. This is a first attempt to technically link the two projects. Therefore the feasibility is quite uncertain and the milestones might have to be re-evaluated after the report on the needs and feasibility. Should it turn out that the needed resources are beyond the capabilities of this project, a detailed report on the requirements and roadmap of and for the realisation shall be written and published.
For a lot of practical purposes it would be a lot easier to add a new communicator for libp2p instead of integrating it as an underlay below core, next to transport. The downside would be that there might be a lot of duplication of functionality.
Protocol-Versioning
Currently applications signal to core which message types they support. For this milestone we will implement proper protocol versioning where higher-level applications can signal a range of protocol versions which they support (min, max) and exchange messages at the CONG layer between peers to determine common protocols.
Open Design Questions
In this section we list design question that are not decided on, yet.
Peer ID changes and connectivity
In case a peer’s addresses change, it gets a new peer id and therefor needs to reconnect. The challenge is to reconnect as fast as possible. The main problem is that a peer cannot know its next peer id in all cases. Connections that have dedicated peers at its endpoints will probably look up the new peer id of the other peer in a higher-layer service, most probably gns.
In the case in which a peer just gains an additional address, that peer can pre-calculate its next peer id, signal it via still open connections on the old peer id and finally switch to using the new peer id.
Other more evolved ideas include using multiple peer ids per peer: Either an additional address-independent peer id that will ‘survive’ address changes and serve as means to link to the address-based peer id after a change. It would just be sent to connected peers and reset once all connections have been re-established.
Alternatively (maybe in addition) peers could use multiple address-based peer ids - one per address. Thus some peer ids might stay unchanged while others go offline.
Another idea to address this challenge is to keep peer ids in use on connections which are still in use, but don’t publish those ids anymore.
Terminology-wise we might add another perspective and say that we selectively and deliberately provide tracking capability to peers which we want to stay in touch with.
Peer ID Terminology
Once peer ids cease to be unique over time the question raises whether they should actually be called identities. (In my intuition an identity is something more persistent.) As we discuss peer ids as an analogue to ip addresses, a natural close idea would be to use “peer address”.
Requirements
In the discussions we seem to have lost partial oversight over things. In this section we figure out the requirements for core (and possibly other components) it’s mechanisms.
TODO: what requirements exactly did we want to document here?
Use Cases and Scenarios
This section is supposed to help with the understanding of the use of elements and structures by imagined examples.
Peer ID
The peer id has been used throughout gnunet as a very convenient means for many purposes. It was used as globally unique and stable identifier of a peer. From now on it should rather be treated as something more volatile and fitting for its layer: an ip address. Below we collect the intended purposes and use cases and also point out some uses for which it was not intended and point to other means to achieve it.
GNS
TODO
CADET
TODO
Messenger
TODO