GNUnet  0.11.x
Data Structures | Macros | Typedefs | Enumerations | Functions
regex_internal.h File Reference

common internal definitions for regex library. More...

#include "regex_internal_lib.h"
Include dependency graph for regex_internal.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  REGEX_INTERNAL_Transition
 Transition between two states. More...
 
struct  REGEX_INTERNAL_StateSet
 Set of states. More...
 
struct  REGEX_INTERNAL_State
 A state. More...
 
struct  REGEX_INTERNAL_Automaton
 Automaton representation. More...
 
struct  REGEX_INTERNAL_Context
 Context that contains an id counter for states and transitions as well as a DLL of automatons used as a stack for NFA construction. More...
 

Macros

#define ALLOWED_LITERALS   "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
 char array of literals that are allowed inside a regex (apart from the operators) More...
 

Typedefs

typedef int(* REGEX_INTERNAL_traverse_check) (void *cls, struct REGEX_INTERNAL_State *s, struct REGEX_INTERNAL_Transition *t)
 Function that get's passed to automaton traversal and is called before each next traversal from state 's' using transition 't' to check if traversal should proceed. More...
 
typedef void(* REGEX_INTERNAL_traverse_action) (void *cls, const unsigned int count, struct REGEX_INTERNAL_State *s)
 Function that is called with each state, when traversing an automaton. More...
 

Enumerations

enum  REGEX_INTERNAL_AutomatonType { NFA, DFA }
 Type of an automaton. More...
 

Functions

struct REGEX_INTERNAL_AutomatonREGEX_INTERNAL_construct_nfa (const char *regex, const size_t len)
 Construct an NFA by parsing the regex string of length 'len'. More...
 
void REGEX_INTERNAL_automaton_traverse (const struct REGEX_INTERNAL_Automaton *a, struct REGEX_INTERNAL_State *start, REGEX_INTERNAL_traverse_check check, void *check_cls, REGEX_INTERNAL_traverse_action action, void *action_cls)
 Traverses the given automaton using depth-first-search (DFS) from it's start state, visiting all reachable states and calling 'action' on each one of them. More...
 
const char * REGEX_INTERNAL_get_canonical_regex (struct REGEX_INTERNAL_Automaton *a)
 Get the canonical regex of the given automaton. More...
 
unsigned int REGEX_INTERNAL_get_transition_count (struct REGEX_INTERNAL_Automaton *a)
 Get the number of transitions that are contained in the given automaton. More...
 
void REGEX_INTERNAL_dfa_add_multi_strides (struct REGEX_INTERNAL_Context *regex_ctx, struct REGEX_INTERNAL_Automaton *dfa, const unsigned int stride_len)
 Adds multi-strided transitions to the given 'dfa'. More...
 

Detailed Description

common internal definitions for regex library.

Author
Maximilian Szengel

Definition in file regex_internal.h.

Macro Definition Documentation

◆ ALLOWED_LITERALS

#define ALLOWED_LITERALS   "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

char array of literals that are allowed inside a regex (apart from the operators)

Definition at line 42 of file regex_internal.h.

Referenced by get_random_literal().

Typedef Documentation

◆ REGEX_INTERNAL_traverse_check

typedef int(* REGEX_INTERNAL_traverse_check) (void *cls, struct REGEX_INTERNAL_State *s, struct REGEX_INTERNAL_Transition *t)

Function that get's passed to automaton traversal and is called before each next traversal from state 's' using transition 't' to check if traversal should proceed.

Return GNUNET_NO to stop traversal or GNUNET_YES to continue.

Parameters
clsclosure for the check.
scurrent state in the traversal.
tcurrent transition from state 's' that will be used for the next step.
Returns
GNUNET_YES to proceed traversal, GNUNET_NO to stop.

Definition at line 343 of file regex_internal.h.

◆ REGEX_INTERNAL_traverse_action

typedef void(* REGEX_INTERNAL_traverse_action) (void *cls, const unsigned int count, struct REGEX_INTERNAL_State *s)

Function that is called with each state, when traversing an automaton.

Parameters
clsclosure.
countcurrent count of the state, from 0 to a->state_count -1.
sstate.

Definition at line 356 of file regex_internal.h.

Enumeration Type Documentation

◆ REGEX_INTERNAL_AutomatonType

Type of an automaton.

Enumerator
NFA 
DFA 

Definition at line 249 of file regex_internal.h.

250 {
251  NFA,
252  DFA
253 };

Function Documentation

◆ REGEX_INTERNAL_construct_nfa()

struct REGEX_INTERNAL_Automaton* REGEX_INTERNAL_construct_nfa ( const char *  regex,
const size_t  len 
)

Construct an NFA by parsing the regex string of length 'len'.

Parameters
regexregular expression string.
lenlength of the string.
Returns
NFA, needs to be freed using REGEX_INTERNAL_automaton_destroy.
Parameters
regexregular expression string
lenlength of the string
Returns
NFA, needs to be freed using REGEX_INTERNAL_destroy_automaton

Definition at line 2803 of file regex_internal.c.

References GNUNET_array_grow, GNUNET_CONTAINER_DLL_remove, GNUNET_ERROR_TYPE_ERROR, GNUNET_free_non_null, GNUNET_log, GNUNET_NO, GNUNET_strdup, REGEX_INTERNAL_Automaton::is_multistrided, nfa_add_alternation(), nfa_add_concatenation(), nfa_add_label(), nfa_add_plus_op(), nfa_add_question_op(), nfa_add_star_op(), number_states(), p, REGEX_INTERNAL_Automaton::regex, REGEX_INTERNAL_automaton_destroy(), REGEX_INTERNAL_automaton_traverse(), REGEX_INTERNAL_context_init(), REGEX_INTERNAL_Context::stack_head, and REGEX_INTERNAL_Context::stack_tail.

Referenced by REGEX_INTERNAL_construct_dfa().

2804 {
2805  struct REGEX_INTERNAL_Context ctx;
2806  struct REGEX_INTERNAL_Automaton *nfa;
2807  const char *regexp;
2808  char curlabel[2];
2809  char *error_msg;
2810  unsigned int count;
2811  unsigned int altcount;
2812  unsigned int atomcount;
2813  unsigned int poff;
2814  unsigned int psize;
2815 
2816  struct
2817  {
2818  int altcount;
2819  int atomcount;
2820  } *p;
2821 
2822  if ((NULL == regex) || (0 == strlen (regex)) || (0 == len))
2823  {
2825  "Could not parse regex. Empty regex string provided.\n");
2826 
2827  return NULL;
2828  }
2830 
2831  regexp = regex;
2832  curlabel[1] = '\0';
2833  p = NULL;
2834  error_msg = NULL;
2835  altcount = 0;
2836  atomcount = 0;
2837  poff = 0;
2838  psize = 0;
2839 
2840  for (count = 0; count < len && *regexp; count++, regexp++)
2841  {
2842  switch (*regexp)
2843  {
2844  case '(':
2845  if (atomcount > 1)
2846  {
2847  --atomcount;
2849  }
2850  if (poff == psize)
2851  GNUNET_array_grow (p, psize, psize * 2 + 4); /* FIXME why *2 +4? */
2852  p[poff].altcount = altcount;
2853  p[poff].atomcount = atomcount;
2854  poff++;
2855  altcount = 0;
2856  atomcount = 0;
2857  break;
2858 
2859  case '|':
2860  if (0 == atomcount)
2861  {
2862  error_msg = "Cannot append '|' to nothing";
2863  goto error;
2864  }
2865  while (--atomcount > 0)
2867  altcount++;
2868  break;
2869 
2870  case ')':
2871  if (0 == poff)
2872  {
2873  error_msg = "Missing opening '('";
2874  goto error;
2875  }
2876  if (0 == atomcount)
2877  {
2878  /* Ignore this: "()" */
2879  poff--;
2880  altcount = p[poff].altcount;
2881  atomcount = p[poff].atomcount;
2882  break;
2883  }
2884  while (--atomcount > 0)
2886  for (; altcount > 0; altcount--)
2888  poff--;
2889  altcount = p[poff].altcount;
2890  atomcount = p[poff].atomcount;
2891  atomcount++;
2892  break;
2893 
2894  case '*':
2895  if (atomcount == 0)
2896  {
2897  error_msg = "Cannot append '*' to nothing";
2898  goto error;
2899  }
2900  nfa_add_star_op (&ctx);
2901  break;
2902 
2903  case '+':
2904  if (atomcount == 0)
2905  {
2906  error_msg = "Cannot append '+' to nothing";
2907  goto error;
2908  }
2909  nfa_add_plus_op (&ctx);
2910  break;
2911 
2912  case '?':
2913  if (atomcount == 0)
2914  {
2915  error_msg = "Cannot append '?' to nothing";
2916  goto error;
2917  }
2919  break;
2920 
2921  default:
2922  if (atomcount > 1)
2923  {
2924  --atomcount;
2926  }
2927  curlabel[0] = *regexp;
2928  nfa_add_label (&ctx, curlabel);
2929  atomcount++;
2930  break;
2931  }
2932  }
2933  if (0 != poff)
2934  {
2935  error_msg = "Unbalanced parenthesis";
2936  goto error;
2937  }
2938  while (--atomcount > 0)
2940  for (; altcount > 0; altcount--)
2942 
2943  GNUNET_array_grow (p, psize, 0);
2944 
2945  nfa = ctx.stack_tail;
2946  GNUNET_CONTAINER_DLL_remove (ctx.stack_head, ctx.stack_tail, nfa);
2947 
2948  if (NULL != ctx.stack_head)
2949  {
2950  error_msg = "Creating the NFA failed. NFA stack was not empty!";
2951  goto error;
2952  }
2953 
2954  /* Remember the regex that was used to generate this NFA */
2955  nfa->regex = GNUNET_strdup (regex);
2956 
2957  /* create depth-first numbering of the states for pretty printing */
2959  NULL,
2960  NULL,
2961  NULL,
2962  &number_states,
2963  NULL);
2964 
2965  /* No multistriding added so far */
2966  nfa->is_multistrided = GNUNET_NO;
2967 
2968  return nfa;
2969 
2970 error:
2971  GNUNET_log (GNUNET_ERROR_TYPE_ERROR, "Could not parse regex: `%s'\n", regex);
2972  if (NULL != error_msg)
2973  GNUNET_log (GNUNET_ERROR_TYPE_ERROR, "%s\n", error_msg);
2974 
2976 
2977  while (NULL != (nfa = ctx.stack_head))
2978  {
2979  GNUNET_CONTAINER_DLL_remove (ctx.stack_head, ctx.stack_tail, nfa);
2981  }
2982 
2983  return NULL;
2984 }
#define GNUNET_CONTAINER_DLL_remove(head, tail, element)
Remove an element from a DLL.
static void nfa_add_plus_op(struct REGEX_INTERNAL_Context *ctx)
Pops an NFA fragment (a) from the stack and adds a new fragment (a+)
#define GNUNET_NO
Definition: gnunet_common.h:78
static void nfa_add_star_op(struct REGEX_INTERNAL_Context *ctx)
Pops a NFA fragment from the stack (a) and adds a new fragment (a*)
#define GNUNET_free_non_null(ptr)
Free the memory pointed to by ptr if ptr is not NULL.
Automaton representation.
#define GNUNET_strdup(a)
Wrapper around GNUNET_xstrdup_.
static struct GNUNET_DNSSTUB_Context * ctx
Context for DNS resolution.
static void nfa_add_alternation(struct REGEX_INTERNAL_Context *ctx)
Pops two NFA fragments (a, b) from the stack and adds a new NFA fragment that alternates between a an...
void REGEX_INTERNAL_automaton_traverse(const struct REGEX_INTERNAL_Automaton *a, struct REGEX_INTERNAL_State *start, REGEX_INTERNAL_traverse_check check, void *check_cls, REGEX_INTERNAL_traverse_action action, void *action_cls)
Traverses the given automaton using depth-first-search (DFS) from it&#39;s start state, visiting all reachable states and calling &#39;action&#39; on each one of them.
Context that contains an id counter for states and transitions as well as a DLL of automatons used as...
#define GNUNET_array_grow(arr, size, tsize)
Grow a well-typed (!) array.
static struct GNUNET_OS_Process * p
Helper process we started.
Definition: gnunet-qr.c:59
static void REGEX_INTERNAL_context_init(struct REGEX_INTERNAL_Context *ctx)
Initialize a new context.
static void nfa_add_question_op(struct REGEX_INTERNAL_Context *ctx)
Pops an NFA fragment (a) from the stack and adds a new fragment (a?)
void REGEX_INTERNAL_automaton_destroy(struct REGEX_INTERNAL_Automaton *a)
Free the memory allocated by constructing the REGEX_INTERNAL_Automaton data structure.
int is_multistrided
GNUNET_YES, if multi strides have been added to the Automaton.
#define GNUNET_log(kind,...)
static void nfa_add_concatenation(struct REGEX_INTERNAL_Context *ctx)
Pops two NFA fragments (a, b) from the stack and concatenates them (ab)
static void number_states(void *cls, const unsigned int count, struct REGEX_INTERNAL_State *s)
Helper function used as &#39;action&#39; in &#39;REGEX_INTERNAL_automaton_traverse&#39; function to create the depth-...
uint16_t len
length of data (which is always a uint32_t, but presumably this can be used to specify that fewer byt...
static void nfa_add_label(struct REGEX_INTERNAL_Context *ctx, const char *label)
Adds a new nfa fragment to the stack.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ REGEX_INTERNAL_automaton_traverse()

void REGEX_INTERNAL_automaton_traverse ( const struct REGEX_INTERNAL_Automaton a,
struct REGEX_INTERNAL_State start,
REGEX_INTERNAL_traverse_check  check,
void *  check_cls,
REGEX_INTERNAL_traverse_action  action,
void *  action_cls 
)

Traverses the given automaton using depth-first-search (DFS) from it's start state, visiting all reachable states and calling 'action' on each one of them.

Parameters
aautomaton to be traversed.
startstart state, pass a->start or NULL to traverse the whole automaton.
checkfunction that is checked before advancing on each transition in the DFS.
check_clsclosure for check.
actionaction to be performed on each state.
action_clsclosure for action
aautomaton to be traversed.
startstart state, pass a->start or NULL to traverse the whole automaton.
checkfunction that is checked before advancing on each transition in the DFS.
check_clsclosure for check.
actionaction to be performed on each state.
action_clsclosure for action

Definition at line 518 of file regex_internal.c.

References automaton_state_traverse(), GNUNET_NO, REGEX_INTERNAL_State::next, start, REGEX_INTERNAL_Automaton::start, REGEX_INTERNAL_Automaton::state_count, REGEX_INTERNAL_Automaton::states_head, and REGEX_INTERNAL_State::traversal_id.

Referenced by automaton_create_proofs(), dfa_remove_unreachable_states(), REGEX_INTERNAL_construct_nfa(), REGEX_INTERNAL_dfa_add_multi_strides(), and REGEX_TEST_automaton_save_graph().

524 {
525  unsigned int count;
526  struct REGEX_INTERNAL_State *s;
527 
528  if ((NULL == a) || (0 == a->state_count))
529  return;
530 
531  int marks[a->state_count];
532 
533  for (count = 0, s = a->states_head; NULL != s && count < a->state_count;
534  s = s->next, count++)
535  {
536  s->traversal_id = count;
537  marks[s->traversal_id] = GNUNET_NO;
538  }
539 
540  count = 0;
541 
542  if (NULL == start)
543  s = a->start;
544  else
545  s = start;
546 
548  marks,
549  &count,
550  check,
551  check_cls,
552  action,
553  action_cls);
554 }
unsigned int state_count
Number of states in the automaton.
struct REGEX_INTERNAL_State * states_head
DLL of states.
unsigned int traversal_id
Unique state id that is used for traversing the automaton.
static int start
Set if we are to start default services (including ARM).
Definition: gnunet-arm.c:39
#define GNUNET_NO
Definition: gnunet_common.h:78
struct REGEX_INTERNAL_State * start
First state of the automaton.
struct REGEX_INTERNAL_State * next
This is a linked list to keep states in an automaton.
static void automaton_state_traverse(struct REGEX_INTERNAL_State *s, int *marks, unsigned int *count, REGEX_INTERNAL_traverse_check check, void *check_cls, REGEX_INTERNAL_traverse_action action, void *action_cls)
Depth-first traversal (DFS) of all states that are reachable from state &#39;s&#39;.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ REGEX_INTERNAL_get_canonical_regex()

const char* REGEX_INTERNAL_get_canonical_regex ( struct REGEX_INTERNAL_Automaton a)

Get the canonical regex of the given automaton.

When constructing the automaton a proof is computed for each state, consisting of the regular expression leading to this state. A complete regex for the automaton can be computed by combining these proofs. As of now this function is only useful for testing.

Parameters
aautomaton for which the canonical regex should be returned.
Returns
canonical regex string.

When constructing the automaton a proof is computed for each state, consisting of the regular expression leading to this state. A complete regex for the automaton can be computed by combining these proofs. As of now this function is only useful for testing.

Parameters
aautomaton for which the canonical regex should be returned.
Returns

Definition at line 3299 of file regex_internal.c.

References REGEX_INTERNAL_Automaton::canonical_regex.

3300 {
3301  if (NULL == a)
3302  return NULL;
3303 
3304  return a->canonical_regex;
3305 }
char * canonical_regex
Canonical regex (result of RX->NFA->DFA->RX)

◆ REGEX_INTERNAL_get_transition_count()

unsigned int REGEX_INTERNAL_get_transition_count ( struct REGEX_INTERNAL_Automaton a)

Get the number of transitions that are contained in the given automaton.

Parameters
aautomaton for which the number of transitions should be returned.
Returns
number of transitions in the given automaton.

Definition at line 3316 of file regex_internal.c.

References REGEX_INTERNAL_State::next, REGEX_INTERNAL_Automaton::states_head, and REGEX_INTERNAL_State::transition_count.

3317 {
3318  unsigned int t_count;
3319  struct REGEX_INTERNAL_State *s;
3320 
3321  if (NULL == a)
3322  return 0;
3323 
3324  t_count = 0;
3325  for (s = a->states_head; NULL != s; s = s->next)
3326  t_count += s->transition_count;
3327 
3328  return t_count;
3329 }
struct REGEX_INTERNAL_State * states_head
DLL of states.
unsigned int transition_count
Number of transitions from this state to other states.
struct REGEX_INTERNAL_State * next
This is a linked list to keep states in an automaton.

◆ REGEX_INTERNAL_dfa_add_multi_strides()

void REGEX_INTERNAL_dfa_add_multi_strides ( struct REGEX_INTERNAL_Context regex_ctx,
struct REGEX_INTERNAL_Automaton dfa,
const unsigned int  stride_len 
)

Adds multi-strided transitions to the given 'dfa'.

Parameters
regex_ctxregex context needed to add transitions to the automaton.
dfaDFA to which the multi strided transitions should be added.
stride_lenlength of the strides.

Definition at line 2191 of file regex_internal.c.

References dfa_add_multi_strides(), REGEX_INTERNAL_Transition::from_state, GNUNET_CONTAINER_DLL_remove, GNUNET_free, GNUNET_free_non_null, GNUNET_YES, REGEX_INTERNAL_Automaton::is_multistrided, REGEX_INTERNAL_Transition::label, REGEX_INTERNAL_Transition::next, REGEX_INTERNAL_automaton_traverse(), REGEX_INTERNAL_Automaton::start, state_add_transition(), t, REGEX_INTERNAL_Transition::to_state, REGEX_INTERNAL_Strided_Context::transitions_head, and REGEX_INTERNAL_Strided_Context::transitions_tail.

2194 {
2195  struct REGEX_INTERNAL_Strided_Context ctx = { stride_len, NULL, NULL };
2196  struct REGEX_INTERNAL_Transition *t;
2197  struct REGEX_INTERNAL_Transition *t_next;
2198 
2199  if ((1 > stride_len) || (GNUNET_YES == dfa->is_multistrided))
2200  return;
2201 
2202  /* Compute the new transitions of given stride_len */
2204  dfa->start,
2205  NULL,
2206  NULL,
2208  &ctx);
2209 
2210  /* Add all the new transitions to the automaton. */
2211  for (t = ctx.transitions_head; NULL != t; t = t_next)
2212  {
2213  t_next = t->next;
2214  state_add_transition (regex_ctx, t->from_state, t->label, t->to_state);
2217  GNUNET_free (t);
2218  }
2219 
2220  /* Mark this automaton as multistrided */
2221  dfa->is_multistrided = GNUNET_YES;
2222 }
#define GNUNET_CONTAINER_DLL_remove(head, tail, element)
Remove an element from a DLL.
struct REGEX_INTERNAL_Transition * next
This is a linked list.
struct REGEX_INTERNAL_Transition * transitions_tail
Strided transitions DLL.
#define GNUNET_free_non_null(ptr)
Free the memory pointed to by ptr if ptr is not NULL.
struct REGEX_INTERNAL_State * from_state
State from which this transition origins.
static struct GNUNET_SCHEDULER_Task * t
Main task.
static struct GNUNET_DNSSTUB_Context * ctx
Context for DNS resolution.
static void dfa_add_multi_strides(void *cls, const unsigned int count, struct REGEX_INTERNAL_State *s)
Function called for each state in the DFA.
void REGEX_INTERNAL_automaton_traverse(const struct REGEX_INTERNAL_Automaton *a, struct REGEX_INTERNAL_State *start, REGEX_INTERNAL_traverse_check check, void *check_cls, REGEX_INTERNAL_traverse_action action, void *action_cls)
Traverses the given automaton using depth-first-search (DFS) from it&#39;s start state, visiting all reachable states and calling &#39;action&#39; on each one of them.
Transition between two states.
struct REGEX_INTERNAL_State * start
First state of the automaton.
static void state_add_transition(struct REGEX_INTERNAL_Context *ctx, struct REGEX_INTERNAL_State *from_state, const char *label, struct REGEX_INTERNAL_State *to_state)
Adds a transition from one state to another on label.
Context for adding strided transitions to a DFA.
int is_multistrided
GNUNET_YES, if multi strides have been added to the Automaton.
struct REGEX_INTERNAL_State * to_state
State to which this transition leads.
char * label
Label for this transition.
#define GNUNET_YES
Definition: gnunet_common.h:77
#define GNUNET_free(ptr)
Wrapper around free.
struct REGEX_INTERNAL_Transition * transitions_head
Strided transitions DLL.
Here is the call graph for this function: