Projekt

Obecné

Profil

Stáhnout (4.14 KB) Statistiky
| Větev: | Revize:
1
# Regular Expression Tokenizer
2

    
3
Tokenizes strings that represent a regular expressions.
4

    
5
[![Build Status](https://secure.travis-ci.org/fent/ret.js.svg)](http://travis-ci.org/fent/ret.js)
6
[![Dependency Status](https://david-dm.org/fent/ret.js.svg)](https://david-dm.org/fent/ret.js)
7
[![codecov](https://codecov.io/gh/fent/ret.js/branch/master/graph/badge.svg)](https://codecov.io/gh/fent/ret.js)
8

    
9
# Usage
10

    
11
```js
12
var ret = require('ret');
13

    
14
var tokens = ret(/foo|bar/.source);
15
```
16

    
17
`tokens` will contain the following object
18

    
19
```js
20
{
21
  "type": ret.types.ROOT
22
  "options": [
23
    [ { "type": ret.types.CHAR, "value", 102 },
24
      { "type": ret.types.CHAR, "value", 111 },
25
      { "type": ret.types.CHAR, "value", 111 } ],
26
    [ { "type": ret.types.CHAR, "value",  98 },
27
      { "type": ret.types.CHAR, "value",  97 },
28
      { "type": ret.types.CHAR, "value", 114 } ]
29
  ]
30
}
31
```
32

    
33
# Token Types
34

    
35
`ret.types` is a collection of the various token types exported by ret.
36

    
37
### ROOT
38

    
39
Only used in the root of the regexp. This is needed due to the posibility of the root containing a pipe `|` character. In that case, the token will have an `options` key that will be an array of arrays of tokens. If not, it will contain a `stack` key that is an array of tokens.
40

    
41
```js
42
{
43
  "type": ret.types.ROOT,
44
  "stack": [token1, token2...],
45
}
46
```
47

    
48
```js
49
{
50
  "type": ret.types.ROOT,
51
  "options" [
52
    [token1, token2...],
53
    [othertoken1, othertoken2...]
54
    ...
55
  ],
56
}
57
```
58

    
59
### GROUP
60

    
61
Groups contain tokens that are inside of a parenthesis. If the group begins with `?` followed by another character, it's a special type of group. A ':' tells the group not to be remembered when `exec` is used. '=' means the previous token matches only if followed by this group, and '!' means the previous token matches only if NOT followed.
62

    
63
Like root, it can contain an `options` key instead of `stack` if there is a pipe.
64

    
65
```js
66
{
67
  "type": ret.types.GROUP,
68
  "remember" true,
69
  "followedBy": false,
70
  "notFollowedBy": false,
71
  "stack": [token1, token2...],
72
}
73
```
74

    
75
```js
76
{
77
  "type": ret.types.GROUP,
78
  "remember" true,
79
  "followedBy": false,
80
  "notFollowedBy": false,
81
  "options" [
82
    [token1, token2...],
83
    [othertoken1, othertoken2...]
84
    ...
85
  ],
86
}
87
```
88

    
89
### POSITION
90

    
91
`\b`, `\B`, `^`, and `$` specify positions in the regexp.
92

    
93
```js
94
{
95
  "type": ret.types.POSITION,
96
  "value": "^",
97
}
98
```
99

    
100
### SET
101

    
102
Contains a key `set` specifying what tokens are allowed and a key `not` specifying if the set should be negated. A set can contain other sets, ranges, and characters.
103

    
104
```js
105
{
106
  "type": ret.types.SET,
107
  "set": [token1, token2...],
108
  "not": false,
109
}
110
```
111

    
112
### RANGE
113

    
114
Used in set tokens to specify a character range. `from` and `to` are character codes.
115

    
116
```js
117
{
118
  "type": ret.types.RANGE,
119
  "from": 97,
120
  "to": 122,
121
}
122
```
123

    
124
### REPETITION
125

    
126
```js
127
{
128
  "type": ret.types.REPETITION,
129
  "min": 0,
130
  "max": Infinity,
131
  "value": token,
132
}
133
```
134

    
135
### REFERENCE
136

    
137
References a group token. `value` is 1-9.
138

    
139
```js
140
{
141
  "type": ret.types.REFERENCE,
142
  "value": 1,
143
}
144
```
145

    
146
### CHAR
147

    
148
Represents a single character token. `value` is the character code. This might seem a bit cluttering instead of concatenating characters together. But since repetition tokens only repeat the last token and not the last clause like the pipe, it's simpler to do it this way.
149

    
150
```js
151
{
152
  "type": ret.types.CHAR,
153
  "value": 123,
154
}
155
```
156

    
157
## Errors
158

    
159
ret.js will throw errors if given a string with an invalid regular expression. All possible errors are
160

    
161
* Invalid group. When a group with an immediate `?` character is followed by an invalid character. It can only be followed by `!`, `=`, or `:`. Example: `/(?_abc)/`
162
* Nothing to repeat. Thrown when a repetitional token is used as the first token in the current clause, as in right in the beginning of the regexp or group, or right after a pipe. Example: `/foo|?bar/`, `/{1,3}foo|bar/`, `/foo(+bar)/`
163
* Unmatched ). A group was not opened, but was closed. Example: `/hello)2u/`
164
* Unterminated group. A group was not closed. Example: `/(1(23)4/`
165
* Unterminated character class. A custom character set was not closed. Example: `/[abc/`
166

    
167

    
168
# Install
169

    
170
    npm install ret
171

    
172

    
173
# Tests
174

    
175
Tests are written with [vows](http://vowsjs.org/)
176

    
177
```bash
178
npm test
179
```
180

    
181
# License
182

    
183
MIT
(2-2/3)