Forums before death by AOL, social media and spammers... "We can't have nice things"
|    alt.privacy    |    Discussing privacy, laws, tinfoil hats    |    112,125 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 110,538 of 112,125    |
|    D to All    |
|    Regular expressions (regex) (2/3)    |
|    10 Aug 24 21:09:14    |
   
   [continued from previous message]   
      
   >multiple times when the modifier /m is used, while "^'' and "$'' will   
   >match at every internal line separator.   
   >The ".'' metacharacter by default matches any character, but if You switch   
   >Off the modifier /s, then '.' won't match embedded line separators.   
   >"^" is at the beginning of a input string, and, if modifier /m is On, also   
   >immediately following any occurrence of \x0D\x0A or \x0A or \x0D and the   
   >Unicode line separators \x2028 or \x2029 or \x0B or \x0C or \x85. Note that   
   >there is no empty line within the sequence \x0D\x0A.   
   >"$" is at the end of a input string, and, if modifier /m is On, also   
   >immediately preceding any occurrence of \x0D\x0A or \x0A or \x0D and the   
   >Unicode line separators \x2028 or \x2029 or \x0B or \x0C or \x85. Note that   
   >there is no empty line within the sequence \x0D\x0A.   
   >"." matches any character, but if You switch Off modifier /s then "."   
   >doesn't match \x0D\x0A and \x0A and \x0D and the Unicode line separators   
   >\x2028 and \x2029 and \x0B and \x0C and \x85.   
   >Note that "^.*$" (an empty line pattern) does not match the empty string   
   >within the sequence \x0D\x0A, but matches the empty string within the   
   >sequence \x0A\x0D.   
   >Metacharacters - predefined classes   
   > \w an alphanumeric character (including "_")   
   > \W a nonalphanumeric   
   > \d a numeric character   
   > \D a non-numeric   
   > \s any space (same as [ \t\n\r\f])   
   > \S a non space   
   >You may use \w, \d and \s within custom character classes.   
   >Examples:   
   > foob\dr matches strings like 'foob1r', ''foob6r' and so on but not   
   > 'foobar', 'foobbr' and so on   
   > foob[\w\s]r matches strings like 'foobar', 'foob r', 'foobbr' and so on   
   > but not 'foob1r', 'foob=r' and so on   
   >Metacharacters - word boundaries   
   > \b Match a word boundary   
   > \B Match a non-(word boundary)   
   >A word boundary (\b) is a spot between two characters that has a \w on one   
   >side of it and a \W on the other side of it (in either order), counting the   
   >imaginary characters off the beginning and end of the string as matching a \W.   
   >Metacharacters - iterators   
   >Any item of a regular expression may be followed by another type of   
   >metacharacters - iterators. Using this metacharacters You can specify number   
   >of occurrences of previous character, metacharacter or subexpression.   
   > * zero or more ("greedy"), similar to {0,}   
   > + one or more ("greedy"), similar to {1,}   
   > ? zero or one ("greedy"), similar to {0,1}   
   > {n} exactly n times ("greedy")   
   > {n,} at least n times ("greedy")   
   > {n,m} at least n but not more than m times ("greedy")   
   > *? zero or more ("non-greedy"), similar to {0,}?   
   > +? one or more ("non-greedy"), similar to {1,}?   
   > ?? zero or one ("non-greedy"), similar to {0,1}?   
   > {n}? exactly n times ("non-greedy")   
   > {n,}? at least n times ("non-greedy")   
   > {n,m}? at least n but not more than m times ("non-greedy")   
   >So, digits in curly brackets of the form {n,m}, specify the minimum number   
   >of times to match the item n and the maximum m. The form {n} is equivalent   
   >to {n,n} and matches exactly n times. The form {n,} matches n or more times.   
   >There is no limit to the size of n or m, but large numbers will chew up   
   >more memory and slow down r.e. execution.   
   >If a curly bracket occurs in any other context, it is treated as a regular   
   >character.   
   >Examples:   
   > foob.*r matches strings like 'foobar', 'foobalkjdflkj9r' and 'foobr'   
   > foob.+r matches strings like 'foobar', 'foobalkjdflkj9r' but not   
   > 'foobr'   
   > foob.?r matches strings like 'foobar', 'foobbr' and 'foobr' but not   
   > 'foobalkj9r'   
   > fooba{2}r matches the string 'foobaar'   
   > fooba{2,}r matches strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.   
   > fooba{2,3}r matches strings like 'foobaar', or 'foobaaar' but not   
   > 'foobaaaar'   
   >A little explanation about "greediness". "Greedy" takes as many as possible,   
   >"non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied   
   >to string 'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty   
   >string, 'b{2,3}?' returns 'bb', 'b{2,3}' returns 'bbb'.   
   >You can switch all iterators into "non-greedy" mode (see the modifier /g).   
   >Metacharacters - alternatives   
   >You can specify a series of alternatives for a pattern using "|'' to   
   >separate them, so that fee|fie|foe will match any of "fee'', "fie'', or   
   >"foe'' in the target string (as would f(e|i|o)e). The first alternative   
   >includes everything from the last pattern delimiter ("('', "['', or the   
   >beginning of the pattern) up to the first "|'', and the last alternative   
   >contains everything from the last "|'' to the next pattern delimiter. For   
   >this reason, it's common practice to include alternatives in parentheses,   
   >to minimize confusion about where they start and end.   
   >Alternatives are tried from left to right, so the first alternative found   
   >for which the entire expression matches, is the one that is chosen. This   
   >means that alternatives are not necessarily greedy. For example: when   
   >matching foo|foot against "barefoot'', only the "foo'' part will match, as   
   >that is the first alternative tried, and it successfully matches the target   
   >string. (This might not seem important, but it is important when you are   
   >capturing matched text using parentheses.)   
   >Also remember that "|'' is interpreted as a literal within square brackets,   
   >so if You write [fee|fie|foe] You're really only matching [feio|].   
   >Examples:   
   > foo(bar|foo) matches strings 'foobar' or 'foofoo'.   
   >Metacharacters - subexpressions   
   >The bracketing construct ( ... ) may also be used for define r.e.   
   >subexpressions.   
   >Subexpressions are numbered based on the left to right order of their   
   >opening parenthesis. The first subexpression has number '1'.   
   >Examples:   
   > (foobar){8,10} matches strings which contain 8, 9 or 10 instances of the   
   > 'foobar'   
   > foob([0-9]|a+)r matches 'foob0r', 'foob1r' , 'foobar', 'foobaar',   
   > 'foobaar' etc.   
   >Metacharacters - backreferences   
   >Metacharacters \1 through \9 are interpreted as backreferences. \
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca