| |
|
 |
Regular Expressions explained
|
Groups and alternation
One thing you might have noticed when we explained quantifiers is that they only worked on the character to the left, since this pretty much limits our expressions I'll explain other uses for quantifiers. Quantifiers can also be used on metacharacters, using them on assertions is silly since they are zero-width and matching one, two, three or more of them doesn't do any good. However the grouping and sequence metacharacters are perfect for being quantified. Let's first start with grouping.
You can form groups, or subexpressions as they are frequently called, by using the begin and end parenthesis characters:
The ( starts the subexpression and the ) ends it. It is also possible to have one or more subexpressions inside a subexpressions. The subexpression will match if the contents match. So mixing this with quantifiers and assertions you can do:
which matches all of the following lines
Another use for the subexpressions are to extract a portion of the match if it matches, this is often used in conjunction with sequences which is discussed later.
You can also use the result of a subexpression for what is called a back reference. A back reference is given by using a backslashified digit, only a single non-zero digit, this leaves you with nine back references.
The back reference matches whatever the corresponding subexpression actually matched (except that {article_contents_1} matches a null character). To find the number of the subexpression count the left parentheses from the left.
The use for back references are somewhat limited, especially since you only have nine of them, but on some rare occasion you might need it. Note some regular expression implementations can use multi-digit numbers as long as they don't start with a 0.
Next is alternations which allows you to match on of many words, the alternation character is
a sample usage is:
would match either Bill, Linus, Steve or Larry, and mixing this with subexpressions and quantifiers we can do:
which matches any of the following words but none other
cow
coward
cowage
cowboy
cowl | I mentioned earlier in the article that not all of the expression must match for the match to be successful, this can happen when you're using subexpressions together with alternations. For instance
((Donald|Dolly) Duck)|(Scrooge McDuck) | As you see only the left or right top subexpression will match, not both, this is sometimes handy when you want to run a complex pattern in one subexpression and if it fails try another one.
Comment List
| Topic: |
Author: |
Time: |
|
another great regexp tool
|
S Church
|
01.03.2005 16:16
|
|
There's a free-as-in-beer development environment for Windows called HTML-Kit that's just great for writing scripts and web code. The Find or Find / Replace functions have a check box for Regexps, with a "Find All" button to highlight every instance matched by a regexp. The only drawback is that it assumes /is (case insensitivity and multiline).
VisualREGEXP mentioned in the article says it has no required supporting files, that the standalone executable is all that's needed. However, most Windows machines don't have the TCL/TK component "wish," which the README file claims is necessary for operation. Wish might be available somewhere online as a precompiled binary without having to install all of TCL/TK, but I'm not motivated enough to google it at the moment.
|
|
Email match
|
David Robarts
|
15.01.2005 22:45
|
|
Some valid email addresses will fail this expression (and some invalid addresses pass).
[a-z0-9_-]+(.[a-z0-9_-]+)*@[a-z0-9_-]+(.[a-z0-9_-]+)+
The underscore character is not allowed in the domain part of the email address and some additional characters are allowed in the username part.
This might be better:
[a-z0-9_-]+(.[a-z0-9_-+]+)*@[a-z0-9-]+(.[a-z0-9-]+)+
|
|
can't see the graphic
|
x x
|
02.11.2001 01:59
|
|
I can't see the graphic towards the bottom to demonstrate the usage of < >
|
|
 |
|
|