FS — File sharing over GNUnet

This chapter describes the details of how the file-sharing service works. As with all services, it is split into an API (libgnunetfs), the service process (gnunet-service-fs) and user interface(s). The file-sharing service uses the datastore service to store blocks and the DHT (and indirectly datacache) for lookups for non-anonymous file-sharing. Furthermore, the file-sharing service uses the block library (and the block fs plugin) for validation of DHT operations.

In contrast to many other services, libgnunetfs is rather complex since the client library includes a large number of high-level abstractions; this is necessary since the FS service itself largely only operates on the block level. The FS library is responsible for providing a file-based abstraction to applications, including directories, meta data, keyword search, verification, and so on.

The method used by GNUnet to break large files into blocks and to use keyword search is called the "Encoding for Censorship Resistant Sharing" (ECRS). ECRS is largely implemented in the fs library; block validation is also reflected in the block FS plugin and the FS service. ECRS on-demand encoding is implemented in the FS service.

Note

The documentation in this chapter is quite incomplete.

ECRS — Encoding for Censorship-Resistant Sharing

When GNUnet shares files, it uses a content encoding that is called ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is described in the (so far unpublished) research paper attached to this page. ECRS obsoletes the previous ESED and ESED II encodings which were used in GNUnet before version 0.7.0. The rest of this page assumes that the reader is familiar with the attached paper. What follows is a description of some minor extensions that GNUnet makes over what is described in the paper. The reason why these extensions are not in the paper is that we felt that they were obvious or trivial extensions to the original scheme and thus did not warrant space in the research report.

Todo

Find missing link to file system paper.

Namespace Advertisements

Todo

FIXME: all zeroses -> ?

An SBlock with identifier all zeros is a signed advertisement for a namespace. This special SBlock contains metadata describing the content of the namespace. Instead of the name of the identifier for a potential update, it contains the identifier for the root of the namespace. The URI should always be empty. The SBlock is signed with the content provider’s RSA private key (just like any other SBlock). Peers can search for SBlocks in order to find out more about a namespace.

KSBlocks

GNUnet implements KSBlocks which are KBlocks that, instead of encrypting a CHK and metadata, encrypt an SBlock instead. In other words, KSBlocks enable GNUnet to find SBlocks using the global keyword search. Usually the encrypted SBlock is a namespace advertisement. The rationale behind KSBlocks and SBlocks is to enable peers to discover namespaces via keyword searches, and, to associate useful information with namespaces. When GNUnet finds KSBlocks during a normal keyword search, it adds the information to an internal list of discovered namespaces. Users looking for interesting namespaces can then inspect this list, reducing the need for out-of-band discovery of namespaces. Naturally, namespaces (or more specifically, namespace advertisements) can also be referenced from directories, but KSBlocks should make it easier to advertise namespaces for the owner of the pseudonym since they eliminate the need to first create a directory.

Collections are also advertised using KSBlocks.

File-sharing persistence directory structure

This section documents how the file-sharing library implements persistence of file-sharing operations and specifically the resulting directory structure. This code is only active if the GNUNET_FS_FLAGS_PERSISTENCE flag was set when calling GNUNET_FS_start. In this case, the file-sharing library will try hard to ensure that all major operations (searching, downloading, publishing, unindexing) are persistent, that is, can live longer than the process itself. More specifically, an operation is supposed to live until it is explicitly stopped.

If GNUNET_FS_stop is called before an operation has been stopped, a SUSPEND event is generated and then when the process calls GNUNET_FS_start next time, a RESUME event is generated. Additionally, even if an application crashes (segfault, SIGKILL, system crash) and hence GNUNET_FS_stop is never called and no SUSPEND events are generated, operations are still resumed (with RESUME events). This is implemented by constantly writing the current state of the file-sharing operations to disk. Specifically, the current state is always written to disk whenever anything significant changes (the exception are block-wise progress in publishing and unindexing, since those operations would be slowed down significantly and can be resumed cheaply even without detailed accounting). Note that if the process crashes (or is killed) during a serialization operation, FS does not guarantee that this specific operation is recoverable (no strict transactional semantics, again for performance reasons). However, all other unrelated operations should resume nicely.

Since we need to serialize the state continuously and want to recover as much as possible even after crashing during a serialization operation, we do not use one large file for serialization. Instead, several directories are used for the various operations. When GNUNET_FS_start executes, the master directories are scanned for files describing operations to resume. Sometimes, these operations can refer to related operations in child directories which may also be resumed at this point. Note that corrupted files are cleaned up automatically. However, dangling files in child directories (those that are not referenced by files from the master directories) are not automatically removed.

Persistence data is kept in a directory that begins with the "STATE_DIR" prefix from the configuration file (by default, "$SERVICEHOME/persistence/") followed by the name of the client as given to GNUNET_FS_start (for example, "gnunet-gtk") followed by the actual name of the master or child directory.

The names for the master directories follow the names of the operations:

  • "search"

  • "download"

  • "publish"

  • "unindex"

Each of the master directories contains names (chosen at random) for each active top-level (master) operation. Note that a download that is associated with a search result is not a top-level operation.

In contrast to the master directories, the child directories are only consulted when another operation refers to them. For each search, a subdirectory (named after the master search synchronization file) contains the search results. Search results can have an associated download, which is then stored in the general "download-child" directory. Downloads can be recursive, in which case children are stored in subdirectories mirroring the structure of the recursive download (either starting in the master "download" directory or in the "download-child" directory depending on how the download was initiated). For publishing operations, the "publish-file" directory contains information about the individual files and directories that are part of the publication. However, this directory structure is flat and does not mirror the structure of the publishing operation. Note that unindex operations cannot have associated child operations.