home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   alt.privacy      Discussing privacy, laws, tinfoil hats      112,125 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 110,538 of 112,125   
   D to All   
   Regular expressions (regex) (2/3)   
   10 Aug 24 21:09:14   
   
   [continued from previous message]   
      
   >multiple times when the modifier /m is used, while "^'' and "$'' will   
   >match at every internal line separator.   
   >The ".'' metacharacter by default matches any character, but if You switch   
   >Off the modifier /s, then '.' won't match embedded line separators.   
   >"^" is at the beginning of a input string, and, if modifier /m is On, also   
   >immediately following any occurrence of \x0D\x0A or \x0A or \x0D and the   
   >Unicode line separators \x2028 or \x2029 or \x0B or \x0C or \x85. Note that   
   >there is no empty line within the sequence \x0D\x0A.   
   >"$" is at the end of a input string, and, if modifier /m is On, also   
   >immediately preceding any occurrence of \x0D\x0A or \x0A or \x0D and the   
   >Unicode line separators \x2028 or \x2029 or \x0B or \x0C or \x85. Note that   
   >there is no empty line within the sequence \x0D\x0A.   
   >"." matches any character, but if You switch Off modifier /s then "."   
   >doesn't match \x0D\x0A and \x0A and \x0D and the Unicode line separators   
   >\x2028 and \x2029 and \x0B and \x0C and \x85.   
   >Note that "^.*$" (an empty line pattern) does not match the empty string   
   >within the sequence \x0D\x0A, but matches the empty string within the   
   >sequence \x0A\x0D.   
   >Metacharacters - predefined classes   
   >  \w     an alphanumeric character (including "_")   
   >  \W     a nonalphanumeric   
   >  \d     a numeric character   
   >  \D     a non-numeric   
   >  \s     any space (same as [ \t\n\r\f])   
   >  \S     a non space   
   >You may use \w, \d and \s within custom character classes.   
   >Examples:   
   >  foob\dr      matches strings like 'foob1r', ''foob6r' and so on but not   
   >               'foobar', 'foobbr' and so on   
   >  foob[\w\s]r  matches strings like 'foobar', 'foob r', 'foobbr' and so on   
   >               but not 'foob1r', 'foob=r' and so on   
   >Metacharacters - word boundaries   
   >  \b     Match a word boundary   
   >  \B     Match a non-(word boundary)   
   >A word boundary (\b) is a spot between two characters that has a \w on one   
   >side of it and a \W on the other side of it (in either order), counting the   
   >imaginary characters off the beginning and end of the string as matching a \W.   
   >Metacharacters - iterators   
   >Any item of a regular expression may be followed by another type of   
   >metacharacters - iterators. Using this metacharacters You can specify number   
   >of occurrences of previous character, metacharacter or subexpression.   
   >  *       zero or more ("greedy"), similar to {0,}   
   >  +       one or more ("greedy"), similar to {1,}   
   >  ?       zero or one ("greedy"), similar to {0,1}   
   >  {n}     exactly n times ("greedy")   
   >  {n,}    at least n times ("greedy")   
   >  {n,m}   at least n but not more than m times ("greedy")   
   >  *?      zero or more ("non-greedy"), similar to {0,}?   
   >  +?      one or more ("non-greedy"), similar to {1,}?   
   >  ??      zero or one ("non-greedy"), similar to {0,1}?   
   >  {n}?    exactly n times ("non-greedy")   
   >  {n,}?   at least n times ("non-greedy")   
   >  {n,m}?  at least n but not more than m times ("non-greedy")   
   >So, digits in curly brackets of the form {n,m}, specify the minimum number   
   >of times to match the item n and the maximum m. The form {n} is equivalent   
   >to {n,n} and matches exactly n times. The form {n,} matches n or more times.   
   >There is no limit to the size of n or m, but large numbers will chew up   
   >more memory and slow down r.e. execution.   
   >If a curly bracket occurs in any other context, it is treated as a regular   
   >character.   
   >Examples:   
   >  foob.*r      matches strings like 'foobar', 'foobalkjdflkj9r' and 'foobr'   
   >  foob.+r      matches strings like 'foobar', 'foobalkjdflkj9r' but not   
   >               'foobr'   
   >  foob.?r      matches strings like 'foobar', 'foobbr' and 'foobr' but not   
   >               'foobalkj9r'   
   >  fooba{2}r    matches the string 'foobaar'   
   >  fooba{2,}r   matches strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.   
   >  fooba{2,3}r  matches strings like 'foobaar', or 'foobaaar' but not   
   >               'foobaaaar'   
   >A little explanation about "greediness". "Greedy" takes as many as possible,   
   >"non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied   
   >to string 'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty   
   >string, 'b{2,3}?' returns 'bb', 'b{2,3}' returns 'bbb'.   
   >You can switch all iterators into "non-greedy" mode (see the modifier /g).   
   >Metacharacters - alternatives   
   >You can specify a series of alternatives for a pattern using "|'' to   
   >separate them, so that fee|fie|foe will match any of "fee'', "fie'', or   
   >"foe'' in the target string (as would f(e|i|o)e). The first alternative   
   >includes everything from the last pattern delimiter ("('', "['', or the   
   >beginning of the pattern) up to the first "|'', and the last alternative   
   >contains everything from the last "|'' to the next pattern delimiter. For   
   >this reason, it's common practice to include alternatives in parentheses,   
   >to minimize confusion about where they start and end.   
   >Alternatives are tried from left to right, so the first alternative found   
   >for which the entire expression matches, is the one that is chosen. This   
   >means that alternatives are not necessarily greedy. For example: when   
   >matching foo|foot against "barefoot'', only the "foo'' part will match, as   
   >that is the first alternative tried, and it successfully matches the target   
   >string. (This might not seem important, but it is important when you are   
   >capturing matched text using parentheses.)   
   >Also remember that "|'' is interpreted as a literal within square brackets,   
   >so if You write [fee|fie|foe] You're really only matching [feio|].   
   >Examples:   
   >  foo(bar|foo)   matches strings 'foobar' or 'foofoo'.   
   >Metacharacters - subexpressions   
   >The bracketing construct ( ... ) may also be used for define r.e.   
   >subexpressions.   
   >Subexpressions are numbered based on the left to right order of their   
   >opening parenthesis. The first subexpression has number '1'.   
   >Examples:   
   >  (foobar){8,10}   matches strings which contain 8, 9 or 10 instances of the   
   >  'foobar'   
   >  foob([0-9]|a+)r  matches 'foob0r', 'foob1r' , 'foobar', 'foobaar',   
   >                   'foobaar' etc.   
   >Metacharacters - backreferences   
   >Metacharacters \1 through \9 are interpreted as backreferences. \   
   >matches previously matched subexpression #.   
   >Examples:   
   >  (.)\1+          matches 'aaaa' and 'cc'.   
   >  (.+)\1+         also match 'abab' and '123123'   
   >  (['"]?)(\d+)\1  matches '"13" (in double quotes), or '4' (in single quotes)   
   >                  or 77 (without quotes) etc   
   >Modifiers   
   >Modifiers are for changing behaviour of a regular expression search.   
   >Any of these modifiers may be embedded within the regular expression itself   
   >using the (?...) construct, e.g to turn off case-insensitive pattern   
   >matching use (?-i). If you want to turn it on again later in the expression   
   >use (?i).   
   >The default modifier states in 40tude Dialog are "imsg" ("x" is not set).   
   >i   
   >Do case-insensitive pattern matching (using installed in you system locale   
   >settings).   
   >m   
   >Treat string as multiple lines. That is, change "^'' and "$'' from matching   
   >at only the very start or end of the string to the start or end of any line   
   >anywhere within the string, see also Line separators.   
   >s   
   >Treat string as single line. That is, change ".'' to match any character   
   >whatsoever, even a line separators (see also Line separators), which it   
   >normally would not match.   
   >g   
   >Non standard modifier. Switching it Off You'll switch all following   
   >operators into non-greedy mode (by default this modifier is On). So, if   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca