Ruby正則表達式
正則表達式是一個特殊的字符序列可以幫助匹配或者找到其他字符串或串套,使用的模式保持一個專門的語法。
正則表達式文本是一個模式之間的斜線之間或任意分隔符 %r 如下:
語法:
/pattern/ /pattern/im # option can be specified %r!/usr/local! # general delimited regular expression
例如:
#!/usr/bin/ruby line1 = "Cats are smarter than dogs"; line2 = "Dogs also like meat"; if ( line1 =~ /Cats(.*)/ ) puts "Line1 starts with Cats" end if ( line2 =~ /Cats(.*)/ ) puts "Line2 starts with Dogs" end
這將產生以下結果:
Line1 starts with Cats
正則表達式修飾符:
正則表達式的文字可以包括一個可選的修飾符來控製各方麵的匹配。修改指定第二個斜杠字符後,如前麵所示,可表示為這些字符之一:
修飾符 | 描述 |
---|---|
i | Ignore case when matching text. |
o | Perform #{} interpolations only once, the first time the regexp literal is evaluated. |
x | Ignores whitespace and allows comments in regular expressions |
m | Matches multiple lines, recognizing newlines as normal characters |
u,e,s,n | Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. If none of these modifiers is specified, the regular expression is assumed to use the source encoding. |
%Q分隔字符串文字一樣,Ruby允許正則表達式帶 %r,然後由所選擇的定界符。這是非常有用的,當所描述的模式中包含正斜杠字符不希望轉義:
# Following matches a single slash character, no escape required %r|/| # Flag characters are allowed with this syntax, too %r[</(.*)>]i
正則表達式模式:
除控製字符, (+ ? . * ^ $ ( ) [ ] { } | ), 所有字符匹配。可以轉義控製字符前麵加上反斜線。
下表列出了可在Ruby的正則表達式語法。
模式 | 描述 |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more occurrence of preceding expression. |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?imx) | Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?-imx) | Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?: re) | Groups regular expressions without remembering matched text. |
(?imx: re) | Temporarily toggles on i, m, or x options within parentheses. |
(?-imx: re) | Temporarily toggles off i, m, or x options within parentheses. |
(?#...) | Comment. |
(?= re) | Specifies position using a pattern. Doesn't have a range. |
(?! re) | Specifies position using pattern negation. Doesn't have a range. |
(?> re) | Matches independent pattern without backtracking. |
w | Matches word characters. |
W | Matches nonword characters. |
s | Matches whitespace. Equivalent to [ f]. |
S | Matches nonwhitespace. |
d | Matches digits. Equivalent to [0-9]. |
D | Matches nondigits. |
A | Matches beginning of string. |