.. index:: double: File sharing; subsystem see: FS; File sharing .. _File_002dsharing-_0028FS_0029-Subsystem: FS — File sharing over GNUnet ============================= This chapter describes the details of how the file-sharing service works. As with all services, it is split into an API (libgnunetfs), the service process (gnunet-service-fs) and user interface(s). The file-sharing service uses the datastore service to store blocks and the DHT (and indirectly datacache) for lookups for non-anonymous file-sharing. Furthermore, the file-sharing service uses the block library (and the block fs plugin) for validation of DHT operations. In contrast to many other services, libgnunetfs is rather complex since the client library includes a large number of high-level abstractions; this is necessary since the FS service itself largely only operates on the block level. The FS library is responsible for providing a file-based abstraction to applications, including directories, meta data, keyword search, verification, and so on. The method used by GNUnet to break large files into blocks and to use keyword search is called the \"Encoding for Censorship Resistant Sharing\" (ECRS). ECRS is largely implemented in the fs library; block validation is also reflected in the block FS plugin and the FS service. ECRS on-demand encoding is implemented in the FS service. .. note:: The documentation in this chapter is quite incomplete. .. _Encoding-for-Censorship_002dResistant-Sharing-_0028ECRS_0029: .. index:: see: Encoding for Censorship-Resistant Sharing; ECRS :index:`ECRS — Encoding for Censorship-Resistant Sharing ` ECRS — Encoding for Censorship-Resistant Sharing ------------------------------------------------ When GNUnet shares files, it uses a content encoding that is called ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is described in the (so far unpublished) research paper attached to this page. ECRS obsoletes the previous ESED and ESED II encodings which were used in GNUnet before version 0.7.0. The rest of this page assumes that the reader is familiar with the attached paper. What follows is a description of some minor extensions that GNUnet makes over what is described in the paper. The reason why these extensions are not in the paper is that we felt that they were obvious or trivial extensions to the original scheme and thus did not warrant space in the research report. .. todo:: Find missing link to file system paper. .. _Namespace-Advertisements: Namespace Advertisements ^^^^^^^^^^^^^^^^^^^^^^^^ .. todo:: FIXME: all zeroses -> ? An ``SBlock`` with identifier all zeros is a signed advertisement for a namespace. This special ``SBlock`` contains metadata describing the content of the namespace. Instead of the name of the identifier for a potential update, it contains the identifier for the root of the namespace. The URI should always be empty. The ``SBlock`` is signed with the content provider's RSA private key (just like any other SBlock). Peers can search for ``SBlock``\ s in order to find out more about a namespace. .. _KSBlocks: KSBlocks ^^^^^^^^ GNUnet implements ``KSBlocks`` which are ``KBlocks`` that, instead of encrypting a CHK and metadata, encrypt an ``SBlock`` instead. In other words, ``KSBlocks`` enable GNUnet to find ``SBlocks`` using the global keyword search. Usually the encrypted ``SBlock`` is a namespace advertisement. The rationale behind ``KSBlock``\ s and ``SBlock``\ s is to enable peers to discover namespaces via keyword searches, and, to associate useful information with namespaces. When GNUnet finds ``KSBlocks`` during a normal keyword search, it adds the information to an internal list of discovered namespaces. Users looking for interesting namespaces can then inspect this list, reducing the need for out-of-band discovery of namespaces. Naturally, namespaces (or more specifically, namespace advertisements) can also be referenced from directories, but ``KSBlock``\ s should make it easier to advertise namespaces for the owner of the pseudonym since they eliminate the need to first create a directory. Collections are also advertised using ``KSBlock``\ s. .. https://old.gnunet.org/sites/default/files/ecrs.pdf .. What is this? - WGL .. _File_002dsharing-persistence-directory-structure: File-sharing persistence directory structure -------------------------------------------- This section documents how the file-sharing library implements persistence of file-sharing operations and specifically the resulting directory structure. This code is only active if the ``GNUNET_FS_FLAGS_PERSISTENCE`` flag was set when calling ``GNUNET_FS_start``. In this case, the file-sharing library will try hard to ensure that all major operations (searching, downloading, publishing, unindexing) are persistent, that is, can live longer than the process itself. More specifically, an operation is supposed to live until it is explicitly stopped. If ``GNUNET_FS_stop`` is called before an operation has been stopped, a ``SUSPEND`` event is generated and then when the process calls ``GNUNET_FS_start`` next time, a ``RESUME`` event is generated. Additionally, even if an application crashes (segfault, SIGKILL, system crash) and hence ``GNUNET_FS_stop`` is never called and no ``SUSPEND`` events are generated, operations are still resumed (with ``RESUME`` events). This is implemented by constantly writing the current state of the file-sharing operations to disk. Specifically, the current state is always written to disk whenever anything significant changes (the exception are block-wise progress in publishing and unindexing, since those operations would be slowed down significantly and can be resumed cheaply even without detailed accounting). Note that if the process crashes (or is killed) during a serialization operation, FS does not guarantee that this specific operation is recoverable (no strict transactional semantics, again for performance reasons). However, all other unrelated operations should resume nicely. Since we need to serialize the state continuously and want to recover as much as possible even after crashing during a serialization operation, we do not use one large file for serialization. Instead, several directories are used for the various operations. When ``GNUNET_FS_start`` executes, the master directories are scanned for files describing operations to resume. Sometimes, these operations can refer to related operations in child directories which may also be resumed at this point. Note that corrupted files are cleaned up automatically. However, dangling files in child directories (those that are not referenced by files from the master directories) are not automatically removed. Persistence data is kept in a directory that begins with the \"STATE_DIR\" prefix from the configuration file (by default, \"$SERVICEHOME/persistence/\") followed by the name of the client as given to ``GNUNET_FS_start`` (for example, \"gnunet-gtk\") followed by the actual name of the master or child directory. The names for the master directories follow the names of the operations: - \"search\" - \"download\" - \"publish\" - \"unindex\" Each of the master directories contains names (chosen at random) for each active top-level (master) operation. Note that a download that is associated with a search result is not a top-level operation. In contrast to the master directories, the child directories are only consulted when another operation refers to them. For each search, a subdirectory (named after the master search synchronization file) contains the search results. Search results can have an associated download, which is then stored in the general \"download-child\" directory. Downloads can be recursive, in which case children are stored in subdirectories mirroring the structure of the recursive download (either starting in the master \"download\" directory or in the \"download-child\" directory depending on how the download was initiated). For publishing operations, the \"publish-file\" directory contains information about the individual files and directories that are part of the publication. However, this directory structure is flat and does not mirror the structure of the publishing operation. Note that unindex operations cannot have associated child operations.