Unix 正則表達式SED

正則表達式是一個字符串，它可以用來描述幾個字符序列。使用正則表達式是由幾個不同的Unix命令，包括 ed, sed, awk, grep，並且，在較為有限的程度上擴展 vi.

本教學將教你如何使用正則表達式使用 sed.

這裡流編輯器sed的代表是麵向流的編輯器，它是專門用於執行腳本創建。因此，所有的輸入送入通過到stdout，它不會改變輸入文件。

調用 sed:

在我們開始之前，讓我們確保你有一個本地副本 /etc/passwd 文件的文本文件，用sed。

正如前麵提到的，可以調用sed的發送數據通過管道如下：

$ cat /etc/passwd | sed
Usage: sed [OPTION]... {script-other-script} [input-file]...

  -n, --quiet, --silent
                 suppress automatic printing of pattern space
  -e script, --expression=script
...............................

cat命令轉儲 /etc/passwd文件的內容通過管道進入sed 模式空間sed 。是內部工作模式空間緩衝區，sed使用做其工作。

sed 一般語法：

以下是 sed 的一般語法

/pattern/action

在這裡，模式是一個正則表達式，動作是下表中給出的命令之一。如果省略模式，執行操作的每一行，正如我們上麵看到的。

斜線字符（/），環繞模式是必需的，因為它們被用來作為分隔符。

Range	描述
p	Prints the line
d	Deletes the line
s/pattern1/pattern2/	Substitutes the first occurrence of pattern1 with pattern2.

用sed刪除所有行：

再次調用sed ，但這個時候告訴sed使用編輯命令刪除行，由單字母d表示：

$ cat /etc/passwd | sed 'd'
$

調用sed 發送文件，通過管道，而是可以指示sed來讀取數據文件，在下麵的例子。

下麵的命令做完全一樣的東西，以前的嘗試，冇有 cat 命令：

$ sed -e 'd' /etc/passwd
$

sed 位址：

SED還了解到一種叫做地址。位址是特定的地點，在一個文件或一個特定的編輯命令應適用範圍。當sed遇到冇有地址，在該文件中的每一行上執行其操作。

以下命令將sed 命令你已經使用了一個基本的地址：

$ cat /etc/passwd | sed '1d' |more
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$

請注意，數字1之前添加刪除編輯命令。這告訴sed執行編輯命令的第一行上的文件。在這個例子中，sed將刪除第一行 /etc/password，並打印文件的其餘部分。

sed 地址範圍：

所以如果你想從文件中刪除多個行？用sed，您可以指定一個地址範圍如下：

$ cat /etc/passwd | sed '1, 5d' |more
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$

上麵的命令將開始從1至5的所有行。所以，刪除前五行。

試試下麵的地址範圍：

Range	描述
'4,10d'	Lines starting from 4th till 10th are deleted
'10,4d'	Only 10th line is deleted, because sed does not work in reverse direction.
'4,+5d'	This will match line 4 in the file, delete that line, continue to delete the next five lines, and then cease its deletion and print the rest
'2,5!d'	This will deleted everything except starting from 2nd till 5th line.
'1~3d'	This deletes the first line, steps over the next three lines, and then deletes the fourth line. Sed continues applying this pattern until the end of the file.
'2~2d'	This tells sed to delete the second line, step over the next line, delete the next line, and repeat until the end of the file is reached.
'4,10p'	Lines starting from 4th till 10th are printed
'4,d'	This would generate syntax error.
',10d'	This would also generate syntax error.

注：使用p動作時，你應該使用-n選項，以避免重複行式打印。檢查以下兩條命令之間的區彆：

$ cat /etc/passwd | sed -n '1,3p'

檢查上麵的命令冇有-n作為如下：

$ cat /etc/passwd | sed '1,3p'

替換命令：

替換命令，用s表示，將您指定的其他任何字符串中指定的任何字符串代替。

用一個字符串代替另一個，你需要有一些方式告訴sed，你的第一個字符串結束，並開始替換字符串。這是傳統上是由兩個字符串bookending斜線（/）字符。

首次出現一行字符串根字符串amrood與下麵的命令替代。

$ cat /etc/passwd | sed 's/root/amrood/'
amrood:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
..........................

這是非常重要的，需要注意的是替代sed的隻有第一次出現的行上。如果字符串根不止一次發生在一行的第一個匹配項將被替換。

告訴sed執行全局替換，添加字母g結束的命令如下：

$ cat /etc/passwd | sed 's/root/amrood/g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
...........................

替代標誌：

還有一些其他有用的g標誌除了可以傳遞的標誌，你可以一次指定多個。

標誌	描述
g	Replace all matches, not just the first match.
NUMBER	Replace only NUMBERth match.
p	If substitution was made, print pattern space.
w FILENAME	If substitution was made, write result to FILENAME.
I or i	Match in a case-insensitive manner.
M or m	In addition to the normal behavior of the special regular expression characters ^ and $, this flag causes ^ to match the empty string after a newline and $ to match the empty string before a newline.

使用替代字符串分隔符：

您可能會發現自己不得不做一個替換在一個字符串，其中包含斜線字符。在這種情況下，您可以指定不同的分隔，提供指定的字符後的s。

$ cat /etc/passwd | sed 's:/root:/amrood:g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

在上麵的例子中，我們使用：作為分隔符，而不是斜線（/），因為我們試圖搜索/root ，而不是簡單的root。

替換空字符：

使用空替換字符串從 /etc/passwd 文件中完全刪除root字符串：

$ cat /etc/passwd | sed 's/root//g'
:x:0:0::/:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

地址替換：

如果你想用quiet 在第10行字符串替換字符串的sh，您可以指定如下：

$ cat /etc/passwd | sed '10s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/quiet

同樣，做一個地址範圍替換，你可以做類似以下內容：

$ cat /etc/passwd | sed '1,5s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/quiet
daemon:x:1:1:daemon:/usr/sbin:/bin/quiet
bin:x:2:2:bin:/bin:/bin/quiet
sys:x:3:3:sys:/dev:/bin/quiet
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

正如你可以看到從輸出前五行字符串的sh改變quiet，但其餘各行均保持不變。

匹配的命令：

你會使用-n選項一起使用p選項打印所有匹配的行，如下所示：

$ cat testing | sed -n '/root/p'
root:x:0:0:root user:/root:/bin/sh
[root@ip-72-167-112-17 amrood]# vi testing
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

使用正則表達式：

在匹配模式中，你可以使用正則表達式，它提供了更多的靈活性。

檢查下麵的例子匹配所有的行開始守護進程，然後刪除它們：

$ cat testing | sed '/^daemon/d'
root:x:0:0:root user:/root:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

下麵的例子將刪除所有的行以sh結束：

$ cat testing | sed '/sh$/d'
sync:x:4:65534:sync:/bin:/bin/sync

下表列出了四個特殊字符在正則表達式中是非常有用的。

字符	描述
^	Matches the beginning of lines.
$	Matches the end of lines.
.	Matches any single character.
*	Matches zero or more occurrences of the previous character
[chars]	Matches any one of the characters given in chars, where chars is a sequence of characters. You can use the - character to indicate a range of characters.

匹配字符：

看幾個表達式元字符演示使用。例如，下麵的模式：

表達式	描述
/a.c/	Matches lines that contain strings such as a+c, a-c, abc, match, and a3c, whereas the pattern
/a*c/	Matches the same strings along with strings such as ace, yacc, and arctic.
/[tT]he/	Matches the string The and the:
/^$/	Matches Blank lines
/^.*$/	Matches an entire line whatever it is.
/ */	Matches one or more spaces
/^$/	Matches Blank lines

下表列出了一些常用的字符集：

Set	描述
[a-z]	Matches a single lowercase letter
[A-Z]	Matches a single uppercase letter
[a-zA-Z]	Matches a single letter
[0-9]	Matches a single number
[a-zA-Z0-9]	Matches a single letter or number

字符類關鍵詞：

一些特殊的關鍵字是常用的正則表達式，特彆是GNU工具，采用正則表達式。這些sed的正則表達式是非常有用的，因為它們簡化了的東西，增強可讀性。

例如，字符a到z以及A到Z的字符構成的字符的其中一類，具有關鍵字 [[:alpha:]]

使用字母字符類的關鍵字，隻有那些行在 /etc/syslog.conf 文件，一個字母開始，這個命令打印：

$ cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p'
authpriv.*                         /var/log/secure
mail.*                             -/var/log/maillog
cron.*                             /var/log/cron
uucp,news.crit                     /var/log/spooler
local7.*                           /var/log/boot.log

下表是GNU sed的可用字符類中的關鍵字的完整列表。

Character Class	描述
[[:alnum:]]	Alphanumeric [a-z A-Z 0-9]
[[:alpha:]]	Alphabetic [a-z A-Z]
[[:blank:]]	Blank characters (spaces or tabs)
[[:cntrl:]]	Control characters
[[:digit:]]	Numbers [0-9]
[[:graph:]]	Any visible characters (excludes whitespace)
[[:lower:]]	Lowercase letters [a-z]
[[:print:]]	Printable characters (noncontrol characters)
[[:punct:]]	Punctuation characters
[[:space:]]	Whitespace
[[:upper:]]	Uppercase letters [A-Z]
[[:xdigit:]]	Hex digits [0-9 a-f A-F]

與符號引用：

sed 字元代表的模式相匹配的內容。例如，假設你有一個文件名為phone.txt的完整電話號碼，如下麵的：

你想更容易閱讀的括號包圍的區域碼（前三位）。要做到這一點，你可以使用符號替換字符，像這樣：

$ sed -e 's/^[[:digit:]][[:digit:]][[:digit:]]/(&)/g' phone.txt
(555)5551212
(555)5551213
(555)5551214
(666)5551215
(666)5551216
(777)5551217

在模式匹配第3位，然後使用要更換這3個數字與周圍的括號。

使用多個sed命令：

您可以使用多個sed命令在一個單一的sed命令如下：

$ sed -e 'command1' -e 'command2' ... -e 'commandN' files

這裡命令通過commandN是前麵討論過的類型的sed命令。這些命令被施加到給定的文件的文件列表中的各行。

我們可以使用相同的機製，上麵寫的電話號碼的例子如下：

$ sed -e 's/^[[:digit:]]{3}/(&)/g'  
                      -e 's/)[[:digit:]]{3}/&-/g' phone.txt
(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217

注：在上麵的例子中，而不是重複字符類關鍵字 [[:digit:]]三次，取而代之的是{3}，這意味著匹配前麵的正則表達式三次。在這裡，我用斷行運行此命令之前你應該刪除。

返回參考：

符號元字符是有用的，但更為有用的是能夠定義特定的區域，在一個正則表達式，這樣你就可以替換字符串中引用它們。通過定義一個正則表達式的特定部分，你可以參考那些部分特彆提到字符。

要做返回引用，你必須首先定義一個區域，然後參考該區域。要定義一個區域，你插入反斜杠括號，圍繞感興趣區域。環繞反斜杠第一區域，然後引用 1， 2 第二區域，依此類推。

假設phone.txt有以下文字：

(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217

現在嘗試下麵的命令：

$ cat phone.txt | sed 's/(.*))(.*-)(.*$)/Area 
                       code: 1 Second: 2 Third: 3/'
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214
Area code: (666) Second: 555- Third: 1215
Area code: (666) Second: 555- Third: 1216
Area code: (777) Second: 555- Third: 1217

注意：在上麵的例子中括號內的每個正則表達式將引用1 2，依此類推。在這裡，我用斷行運行此命令之前你應該刪除。