Projekt

Obecné

Profil

Stáhnout (12.3 KB) Statistiky
| Větev: | Revize:
1
# Regenerate [![Build status](https://travis-ci.org/mathiasbynens/regenerate.svg?branch=master)](https://travis-ci.org/mathiasbynens/regenerate) [![Code coverage status](https://img.shields.io/codecov/c/github/mathiasbynens/regenerate.svg)](https://codecov.io/gh/mathiasbynens/regenerate) [![Dependency status](https://gemnasium.com/mathiasbynens/regenerate.svg)](https://gemnasium.com/mathiasbynens/regenerate)
2

    
3
_Regenerate_ is a Unicode-aware regex generator for JavaScript. It allows you to easily generate ES5-compatible regular expressions based on a given set of Unicode symbols or code points. (This is trickier than you might think, because of [how JavaScript deals with astral symbols](https://mathiasbynens.be/notes/javascript-unicode).)
4

    
5
## Installation
6

    
7
Via [npm](https://npmjs.org/):
8

    
9
```bash
10
npm install regenerate
11
```
12

    
13
Via [Bower](http://bower.io/):
14

    
15
```bash
16
bower install regenerate
17
```
18

    
19
Via [Component](https://github.com/component/component):
20

    
21
```bash
22
component install mathiasbynens/regenerate
23
```
24

    
25
In a browser:
26

    
27
```html
28
<script src="regenerate.js"></script>
29
```
30

    
31
In [Node.js](https://nodejs.org/), [io.js](https://iojs.org/), and [RingoJS ≥ v0.8.0](http://ringojs.org/):
32

    
33
```js
34
var regenerate = require('regenerate');
35
```
36

    
37
In [Narwhal](http://narwhaljs.org/) and [RingoJS ≤ v0.7.0](http://ringojs.org/):
38

    
39
```js
40
var regenerate = require('regenerate').regenerate;
41
```
42

    
43
In [Rhino](http://www.mozilla.org/rhino/):
44

    
45
```js
46
load('regenerate.js');
47
```
48

    
49
Using an AMD loader like [RequireJS](http://requirejs.org/):
50

    
51
```js
52
require(
53
  {
54
    'paths': {
55
      'regenerate': 'path/to/regenerate'
56
    }
57
  },
58
  ['regenerate'],
59
  function(regenerate) {
60
    console.log(regenerate);
61
  }
62
);
63
```
64

    
65
## API
66

    
67
### `regenerate(value1, value2, value3, ...)`
68

    
69
The main Regenerate function. Calling this function creates a new set that gets a chainable API.
70

    
71
```js
72
var set = regenerate()
73
  .addRange(0x60, 0x69) // add U+0060 to U+0069
74
  .remove(0x62, 0x64) // remove U+0062 and U+0064
75
  .add(0x1D306); // add U+1D306
76
set.valueOf();
77
// → [0x60, 0x61, 0x63, 0x65, 0x66, 0x67, 0x68, 0x69, 0x1D306]
78
set.toString();
79
// → '[`ace-i]|\\uD834\\uDF06'
80
set.toRegExp();
81
// → /[`ace-i]|\uD834\uDF06/
82
```
83

    
84
Any arguments passed to `regenerate()` will be added to the set right away. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.
85

    
86
```js
87
regenerate(0x1D306, 'A', '©', 0x2603).toString();
88
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'
89

    
90
var items = [0x1D306, 'A', '©', 0x2603];
91
regenerate(items).toString();
92
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'
93
```
94

    
95
### `regenerate.prototype.add(value1, value2, value3, ...)`
96

    
97
Any arguments passed to `add()` are added to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.
98

    
99
```js
100
regenerate().add(0x1D306, 'A', '©', 0x2603).toString();
101
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'
102

    
103
var items = [0x1D306, 'A', '©', 0x2603];
104
regenerate().add(items).toString();
105
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'
106
```
107

    
108
It’s also possible to pass in a Regenerate instance. Doing so adds all code points in that instance to the current set.
109

    
110
```js
111
var set = regenerate(0x1D306, 'A');
112
regenerate().add('©', 0x2603).add(set).toString();
113
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'
114
```
115

    
116
Note that the initial call to `regenerate()` acts like `add()`. This allows you to create a new Regenerate instance and add some code points to it in one go:
117

    
118
```js
119
regenerate(0x1D306, 'A', '©', 0x2603).toString();
120
// → '[A\\xA9\\u2603]|\\uD834\\uDF06'
121
```
122

    
123
### `regenerate.prototype.remove(value1, value2, value3, ...)`
124

    
125
Any arguments passed to `remove()` are removed to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.
126

    
127
```js
128
regenerate(0x1D306, 'A', '©', 0x2603).remove('').toString();
129
// → '[A\\xA9]|\\uD834\\uDF06'
130
```
131

    
132
It’s also possible to pass in a Regenerate instance. Doing so removes all code points in that instance from the current set.
133

    
134
```js
135
var set = regenerate('');
136
regenerate(0x1D306, 'A', '©', 0x2603).remove(set).toString();
137
// → '[A\\xA9]|\\uD834\\uDF06'
138
```
139

    
140
### `regenerate.prototype.addRange(start, end)`
141

    
142
Adds a range of code points from `start` to `end` (inclusive) to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.
143

    
144
```js
145
regenerate(0x1D306).addRange(0x00, 0xFF).toString(16);
146
// → '[\\0-\\xFF]|\\uD834\\uDF06'
147

    
148
regenerate().addRange('A', 'z').toString();
149
// → '[A-z]'
150
```
151

    
152
### `regenerate.prototype.removeRange(start, end)`
153

    
154
Removes a range of code points from `start` to `end` (inclusive) from the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.
155

    
156
```js
157
regenerate()
158
  .addRange(0x000000, 0x10FFFF) // add all Unicode code points
159
  .removeRange('A', 'z') // remove all symbols from `A` to `z`
160
  .toString();
161
// → '[\\0-@\\{-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
162

    
163
regenerate()
164
  .addRange(0x000000, 0x10FFFF) // add all Unicode code points
165
  .removeRange(0x0041, 0x007A) // remove all code points from U+0041 to U+007A
166
  .toString();
167
// → '[\\0-@\\{-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
168
```
169

    
170
### `regenerate.prototype.intersection(codePoints)`
171

    
172
Removes any code points from the set that are not present in both the set and the given `codePoints` array. `codePoints` must be an array of numeric code point values, i.e. numbers.
173

    
174
```js
175
regenerate()
176
  .addRange(0x00, 0xFF) // add extended ASCII code points
177
  .intersection([0x61, 0x69]) // remove all code points from the set except for these
178
  .toString();
179
// → '[ai]'
180
```
181

    
182
Instead of the `codePoints` array, it’s also possible to pass in a Regenerate instance.
183

    
184
```js
185
var whitelist = regenerate(0x61, 0x69);
186

    
187
regenerate()
188
  .addRange(0x00, 0xFF) // add extended ASCII code points
189
  .intersection(whitelist) // remove all code points from the set except for those in the `whitelist` set
190
  .toString();
191
// → '[ai]'
192
```
193

    
194
### `regenerate.prototype.contains(value)`
195

    
196
Returns `true` if the given value is part of the set, and `false` otherwise. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.
197

    
198
```js
199
var set = regenerate().addRange(0x00, 0xFF);
200
set.contains('A');
201
// → true
202
set.contains(0x1D306);
203
// → false
204
```
205

    
206
### `regenerate.prototype.clone()`
207

    
208
Returns a clone of the current code point set. Any actions performed on the clone won’t mutate the original set.
209

    
210
```js
211
var setA = regenerate(0x1D306);
212
var setB = setA.clone().add(0x1F4A9);
213
setA.toArray();
214
// → [0x1D306]
215
setB.toArray();
216
// → [0x1D306, 0x1F4A9]
217
```
218

    
219
### `regenerate.prototype.toString(options)`
220

    
221
Returns a string representing (part of) a regular expression that matches all the symbols mapped to the code points within the set.
222

    
223
```js
224
regenerate(0x1D306, 0x1F4A9).toString();
225
// → '\\uD834\\uDF06|\\uD83D\\uDCA9'
226
```
227

    
228
If the `bmpOnly` property of the optional `options` object is set to `true`, the output matches surrogates individually, regardless of whether they’re lone surrogates or just part of a surrogate pair. This simplifies the output, but it can only be used in case you’re certain the strings it will be used on don’t contain any astral symbols.
229

    
230
```js
231
var highSurrogates = regenerate().addRange(0xD800, 0xDBFF);
232
highSurrogates.toString();
233
// → '[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])'
234
highSurrogates.toString({ 'bmpOnly': true });
235
// → '[\\uD800-\\uDBFF]'
236

    
237
var lowSurrogates = regenerate().addRange(0xDC00, 0xDFFF);
238
lowSurrogates.toString();
239
// → '(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
240
lowSurrogates.toString({ 'bmpOnly': true });
241
// → '[\\uDC00-\\uDFFF]'
242
```
243

    
244
Note that lone low surrogates cannot be matched accurately using regular expressions in JavaScript. Regenerate’s output makes a best-effort approach but [there can be false negatives in this regard](https://github.com/mathiasbynens/regenerate/issues/28#issuecomment-72224808).
245

    
246
If the `hasUnicodeFlag` property of the optional `options` object is set to `true`, the output makes use of Unicode code point escapes (`\u{…}`) where applicable. This simplifies the output at the cost of compatibility and portability, since it means the output can only be used as a pattern in a regular expression with [the ES6 `u` flag](https://mathiasbynens.be/notes/es6-unicode-regex) enabled.
247

    
248
```js
249
var set = regenerate().addRange(0x0, 0x10FFFF);
250

    
251
set.toString();
252
// → '[\\0-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]''
253

    
254
set.toString({ 'hasUnicodeFlag': true });
255
// → '[\\0-\\u{10FFFF}]'
256
```
257

    
258
### `regenerate.prototype.toRegExp(flags = '')`
259

    
260
Returns a regular expression that matches all the symbols mapped to the code points within the set. Optionally, you can pass [flags](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#Parameters) to be added to the regular expression.
261

    
262
```js
263
var regex = regenerate(0x1D306, 0x1F4A9).toRegExp();
264
// → /\uD834\uDF06|\uD83D\uDCA9/
265
regex.test('𝌆');
266
// → true
267
regex.test('A');
268
// → false
269

    
270
// With flags:
271
var regex = regenerate(0x1D306, 0x1F4A9).toRegExp('g');
272
// → /\uD834\uDF06|\uD83D\uDCA9/g
273
```
274

    
275
**Note:** This probably shouldn’t be used. Regenerate is intended as a tool that is used as part of a build process, not at runtime.
276

    
277
### `regenerate.prototype.valueOf()` or `regenerate.prototype.toArray()`
278

    
279
Returns a sorted array of unique code points in the set.
280

    
281
```js
282
regenerate(0x1D306)
283
  .addRange(0x60, 0x65)
284
  .add(0x59, 0x60) // note: 0x59 is added after 0x65, and 0x60 is a duplicate
285
  .valueOf();
286
// → [0x59, 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x1D306]
287
```
288

    
289
### `regenerate.version`
290

    
291
A string representing the semantic version number.
292

    
293
## Combine Regenerate with other libraries
294

    
295
Regenerate gets even better when combined with other libraries such as [Punycode.js](https://mths.be/punycode). Here’s an example where [Punycode.js](https://mths.be/punycode) is used to convert a string into an array of code points, that is then passed on to Regenerate:
296

    
297
```js
298
var regenerate = require('regenerate');
299
var punycode = require('punycode');
300

    
301
var string = 'Lorem ipsum dolor sit amet.';
302
// Get an array of all code points used in the string:
303
var codePoints = punycode.ucs2.decode(string);
304

    
305
// Generate a regular expression that matches any of the symbols used in the string:
306
regenerate(codePoints).toString();
307
// → '[ \\.Ladeilmopr-u]'
308
```
309

    
310
In ES6 you can do something similar with [`Array.from`](https://mths.be/array-from) which uses [the string’s iterator](https://mathiasbynens.be/notes/javascript-unicode#iterating-over-symbols) to split the given string into an array of strings that each contain a single symbol. [`regenerate()`](#regenerateprototypeaddvalue1-value2-value3-) accepts both strings and code points, remember?
311

    
312
```js
313
var regenerate = require('regenerate');
314

    
315
var string = 'Lorem ipsum dolor sit amet.';
316
// Get an array of all symbols used in the string:
317
var symbols = Array.from(string);
318

    
319
// Generate a regular expression that matches any of the symbols used in the string:
320
regenerate(symbols).toString();
321
// → '[ \\.Ladeilmopr-u]'
322
```
323

    
324
## Support
325

    
326
Regenerate supports at least Chrome 27+, Firefox 3+, Safari 4+, Opera 10+, IE 6+, Node.js v0.10.0+, io.js v1.0.0+, Narwhal 0.3.2+, RingoJS 0.8+, PhantomJS 1.9.0+, and Rhino 1.7RC4+.
327

    
328
## Unit tests & code coverage
329

    
330
After cloning this repository, run `npm install` to install the dependencies needed for Regenerate development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`.
331

    
332
Once that’s done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`.
333

    
334
To generate the code coverage report, use `grunt cover`.
335

    
336
## Author
337

    
338
| [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
339
|---|
340
| [Mathias Bynens](https://mathiasbynens.be/) |
341

    
342
## License
343

    
344
Regenerate is available under the [MIT](https://mths.be/mit) license.
(1-1/3)