如何看懂man page？（轉(zhuǎn)）

linux_Ultra · 發(fā)表于 2009-6-30 09:28:07

看懂man page是做 Linux開發(fā)最基本的要求，然而很多新手非常不喜歡看man page，我們在教
學(xué)中發(fā)現(xiàn)，雖然從第一天講編程就開始強調(diào)一定要看man page，rtfm=read the f*cking
manual，但結(jié)果是很多學(xué)生都想方設(shè)法繞過看man page，一個月以后，從沒來仔細(xì)看過一個
man page的學(xué)生仍然有半數(shù)以上。

比如有一本《Linux常用C函數(shù)（中文版）》就是學(xué)生們的最愛，雖然我們從來沒有推薦過也
沒有提供過這本書的電子版或印刷版，但是學(xué)生幾乎人手一份。這本書的風(fēng)格和man page截
然不同，函數(shù)接口的說明非常簡略，遠(yuǎn)遠(yuǎn)沒有涵蓋man page的要點，然而每個函數(shù)后面都不
厭其煩地舉一個例子，即使這個函數(shù)的用法已經(jīng)像禿頭上的虱子那么明顯了也要舉個例子，
而且通常這個例子寫得極不規(guī)范，例如從來不判斷出錯返回值。讓我說，這本書就是垃圾，
這本書的存在不僅浪費空間，而且害人不淺。適合新手速查是沒有錯，但人都是有惰性的，
新手往往都會依賴上這本書，不用去看man page，也不想去看，看man page干嗎？東拉西扯
說了那么多，費半天勁也看不懂，而且最后連個例子都沒有，看完還是不知道怎么調(diào)用這個
函數(shù)，哪有看這本書學(xué)得輕松，連字都不用看，直接把例子粘貼到自己的代碼中就行了。

新手就這樣被毒害了：第一，剛才說了，這些例子極不規(guī)范，bug很多，就是垃圾代碼，誰用
了它誰的代碼也就成了垃圾代碼；第二，說明得太簡略，容易讓人產(chǎn)生片面理解和誤解。第
三，助長了新手的惰性，雖然靠這本書能寫出很多程序，但英文能力、理解能力和技術(shù)水平
都長期停滯不前，根本不能算是學(xué)習(xí)提高了；第四，這本書畢竟只介紹了數(shù)量有限的C函數(shù)，
實際工作中當(dāng)然會用到很多書上沒有的函數(shù)，本來看看man page就會用了，但是新手們已經(jīng)
離不開這本書了，必然會想一些湊合應(yīng)付的辦法，用書上有的函數(shù)代替書上沒有的函數(shù)去應(yīng)
付工作。就這樣，這本masterpiece培養(yǎng)出了一大批合格的垃圾代碼制造者。

還有一本《Linux C函數(shù)庫詳解詞典》也是這一類書的典型代表，和上面說的那本大同小異。
扯點離題的話，我有一個更極端的觀點：任何給程序員看的文檔都不應(yīng)該翻譯成中文，因為
不具備流暢地閱讀英文的能力就不是一個合格的程序員，應(yīng)該先去學(xué)好英文再學(xué)編程，更何
況翻譯總會引入新的錯誤和不準(zhǔn)確，使文檔的質(zhì)量下降。只有給用戶看的文檔才應(yīng)該翻譯成
中文，因為不能要求用戶達到多高的水平才可以使用這個軟件。

把難理解的、難掌握的都回避了，把本來很復(fù)雜的man page閹割了之后再去教給新手，讓他
們以為掌握技術(shù)就是這么簡單，一書在手，萬事不愁，這根本不算是教育。真正的教育不應(yīng)
該回避任何復(fù)雜性，而應(yīng)該是舉一反三，把一個復(fù)雜的問題給學(xué)生分析透了，然后啟發(fā)學(xué)生
自己去解決其它的復(fù)雜問題。下面我來仔細(xì)剖析一個man page，通過這一個例子說明man
page的行文中存在的普遍規(guī)律，說明應(yīng)該如何理解一個man page，以達到舉一反三的目的，
我相信我這一篇文章比以上兩本爛書對新手更為有用。

這是POSIX規(guī)范中正則表達式的C函數(shù)的man page，讀者要用這些函數(shù)首先要對正則表達式的
概念非常清晰，知道正則表達式能用來干什么，不能用來干什么，要干的話怎么干，并且能
夠很熟練地寫出正則表達式來，每個man page都是高度cohesive的，不會教你這些偏離主題
的東西。也就是說，首先你期望要用這些函數(shù)完成什么工作必須非常清楚，如果自己都不知
道自己要干什么，man page是幫不了你的。

linux_Ultra · 發(fā)表于 2009-6-30 09:28:36

1. REGEX(3)                Linux Programmer’s Manual                REGEX(3)
2.
3. NAME
4.       regcomp, regexec, regerror, regfree - POSIX regex functions
5.
6. SYNOPSIS
7.       #include
8.       #include
9.
  10.       int regcomp(regex_t *preg, const char *regex, int cflags);
  11.
  12.       int regexec(const regex_t *preg, const char *string, size_t nmatch,
  13.                   regmatch_t pmatch[], int eflags);
  14.
  15.       size_t regerror(int errcode, const regex_t *preg, char *errbuf,
  16.                      size_t errbuf_size);
  17.
  18.       void regfree(regex_t *preg);

linux_Ultra · 發(fā)表于 2009-6-30 09:30:39

這個man page描述了四個函數(shù)的用法。本來我只是想用一個正則表達式匹配一個字符串，并取得
匹配結(jié)果，也就是說我想要的是這樣一個函數(shù)：

C代碼

1. int my_expect_func(傳入：正則表達式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
2. 返回：錯誤碼

int my_expect_func(傳入：正則表達式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
返回：錯誤碼

怎么會有四個函數(shù)呢？哪個跟我想要的函數(shù)最相關(guān)？其它函數(shù)又是做什么的？這是一個好的
閱讀習(xí)慣：你要主動去猜測，而不是被動地接受信息。理解的過程應(yīng)該是拿你的猜測
去和文字描述相比較，如果相符就說明理解對了，如果不符就要提出一個新的猜測去比較，
完全被動地接受信息那不叫理解。

傳入?yún)?shù)和傳出參數(shù)是一個重要的提示，Linux的庫函數(shù)原型都是非常規(guī)范的，const指針一
定是傳入?yún)?shù)，非const指針一定有傳出值（可能是傳出參數(shù)，也可能是傳入-傳出參數(shù)），
所以，函數(shù)原型就已經(jīng)非常清楚地告訴你應(yīng)該怎么調(diào)用這個函數(shù)了，根本沒必要給出代碼例
子。看第一個函數(shù)：

C代碼

1. int regcomp(regex_t *preg, const char *regex, int cflags);

   int regcomp(regex_t *preg, const char *regex, int cflags);

preg是傳出參數(shù)，需要事先分配該對象的內(nèi)存然后把地址傳給regcomp函數(shù)，regex是傳入?yún)?br /> 數(shù)，cflags是標(biāo)志位，preg不知道是什么，但regex就是regular expression，正則表達式，
又是char *型的，應(yīng)該沒錯了，不用看下面的說明就可以猜測這個函數(shù)是這樣調(diào)用的：

C代碼

1. regex_t regobj;
2. regcomp(®obj, "正則表達式", 標(biāo)志位1|標(biāo)志位2|...);

regex_t regobj;
regcomp(®obj, "正則表達式", 標(biāo)志位1|標(biāo)志位2|...);

再強調(diào)一遍，要想理解一段文字，就要充分調(diào)動經(jīng)驗和推理，主動去猜測，然后看下文驗證
你的猜測，而不是被動接受信息。怎么推理呢？以上函數(shù)傳入一個正則表達式，指定幾個標(biāo)
志，傳出一個值，應(yīng)該是把正則表達式轉(zhuǎn)換格式了吧？這就叫推理。相反，如果我根本不管
preg是一個傳出參數(shù)，而且也不是字符串型的，非要往my_expect_func的形式上套，既然
regex參數(shù)是正則表達式，那么preg參數(shù)就應(yīng)該是目標(biāo)字符串，這就不叫推理和猜測，叫瞎蒙。

linux_Ultra · 發(fā)表于 2009-6-30 09:32:22

如果對正則表達式的機理有一定了解，就可以借助這個經(jīng)驗猜到這個函數(shù)大概是把正則表達
式字符串轉(zhuǎn)換成狀態(tài)機以便高效地匹配目標(biāo)字符串。如果以前用過其它編程語言的正則表達
式庫函數(shù)，也可以借助這些經(jīng)驗知道正則表達式在使用之前大多有一個預(yù)處理的步驟。另
外，對英文縮寫要有一定敏感性，函數(shù)名是regcomp，reg就是正則表達式，comp是compare還
是compile？如果是compare，那應(yīng)該有兩個相同類型的參數(shù)來做比較，就像strcmp，這里顯
然是compile，編譯，把字符串形式轉(zhuǎn)為二進制形式，從另一個側(cè)面也驗證了前面的猜測。這
些都是靠經(jīng)驗而不是推理得到的，經(jīng)驗有助于更快更準(zhǔn)確地理解，但不是必須的，因為事實
上我們通過上面基于傳入傳出參數(shù)的推理已經(jīng)猜出正確結(jié)論了，只不過有經(jīng)驗的人會對自己
的猜測更自信。

對英文縮寫敏感是看man page和看代碼需要具備的最基本的能力，但這需要長期的練習(xí)才能
找到感覺。也許你要學(xué)會一個函數(shù)怎么用并不必知道函數(shù)名和各個參數(shù)名是什么的縮寫，你
通過以上列舉的兩本爛書就可以學(xué)會怎么用，但如果總是回避man page，總是不去做猜縮寫
的練習(xí)，就不可能看懂別人的代碼，不看別人的代碼就自己亂寫代碼，連變量名該怎么起都
不知道，寫出來的永遠(yuǎn)是垃圾代碼。對于regcomp這個函數(shù)名以及各參數(shù)名，regex是
regular expression，regcomp是regular expression compile。那么preg是什么？reg是
regular expression，p表示什么呢？表示指針？那是微軟的infamous的hungarian
notation，Linux上肯定不是這么用的，這里的p我猜是precompiled。cflags的c是什么？不
知道，但是跟下面一個函數(shù)對比來看：

C代碼

1. int regexec(const regex_t *preg, const char *string, size_t nmatch,
2.          regmatch_t pmatch[], int eflags);

   int regexec(const regex_t *preg, const char *string, size_t nmatch,
               regmatch_t pmatch[], int eflags);

這個函數(shù)有個參數(shù)叫eflags。所以c是regcomp的c，而e是regexec的e，一個是編譯時的
flags，一個是執(zhí)行時的flags，這兩種flags的取值必然不同，下文必然會分別說明。這又是
一種猜測：猜測下文的行文邏輯。這種猜測同樣是非常有助于理解的。后面幾個函數(shù)的函數(shù)
名和參數(shù)名是怎么縮寫的，留給讀者自己練習(xí)。

preg參數(shù)在regcomp中是傳出參數(shù)，在regexec中卻是傳入?yún)?shù)，根據(jù)推理，preg是由
regcomp函數(shù)填寫好之后傳給regexec函數(shù)用的，也就是說正則表達式以轉(zhuǎn)換之后的二進制格
式傳給regexec函數(shù)來用。regexec又有一個字符串傳入?yún)?shù)string，還有兩個match參數(shù)表示
匹配結(jié)果，pmatch是傳出參數(shù)，表示緩沖區(qū)首地址，nmatch表示緩沖區(qū)長度（根據(jù)經(jīng)驗，這
類似于strncpy），這必然就是我一開始想要的my_expect_func了：

C代碼

1. int my_expect_func(傳入：正則表達式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
2. 返回：錯誤碼

int my_expect_func(傳入：正則表達式, 傳入：目標(biāo)字符串, 傳出：匹配結(jié)果);
返回：錯誤碼

preg對應(yīng)正則表達式，pmatch和nmatch對應(yīng)匹配結(jié)果，因此string這個傳入?yún)?shù)必然是目標(biāo)
字符串了。pmatch是一個指針變量，但是寫成pmatch[]，說明它指向的是一組而不是一個
regmatch_t類型的對象，這一組有多少個呢？用nmatch參數(shù)表示。和strncpy類似，這一組
regmatch_t對象應(yīng)該由我們事先分配好再傳給函數(shù)。因此這兩個函數(shù)應(yīng)該是這樣調(diào)用的：

C代碼

1. regex_t regobj;
2. regcomp(®obj, "正則表達式", 標(biāo)志位1|標(biāo)志位2|...);
3. regmatch_t matchbuf[10];
4. regexec(®obj, "目標(biāo)字符串", 10, matchbuf, 標(biāo)志位1|標(biāo)志位2|...);

regex_t regobj;
regcomp(®obj, "正則表達式", 標(biāo)志位1|標(biāo)志位2|...);
regmatch_t matchbuf[10];
regexec(®obj, "目標(biāo)字符串", 10, matchbuf, 標(biāo)志位1|標(biāo)志位2|...);

regmatch_t對象如何表示一個匹配呢？如果一個正則表達式模式在一個目標(biāo)字符串中有五次
出現(xiàn)，如何表示這五次出現(xiàn)呢？可以猜測這個regmatch_t結(jié)構(gòu)體一定包含了在目標(biāo)字符串中
的匹配位置信息。另外，我傳進去10個regmatch_t對象，如果只有五次匹配，函數(shù)返回后我
怎么知道前面五個對象是有效的匹配信息而后面是無效的呢？是不是通過一個參數(shù)或返回值
表示匹配次數(shù)的？該函數(shù)并沒有額外的參數(shù)，而且快速翻看一下man page的RETURN
VALUE節(jié)，這個函數(shù)返回值是錯誤碼，也不表示匹配次數(shù)。那這個函數(shù)一定會在后面無效的
regmatch_t對象里填充一個特殊值，這就是推理，這個猜測將會在閱讀后面的文字時證實或
證偽，不管猜得對不對，一定會在后面得到答案。

linux_Ultra · 發(fā)表于 2009-6-30 09:32:53

后面還有兩個函數(shù)：

C代碼

1. size_t regerror(int errcode, const regex_t *preg, char *errbuf,
2.                size_t errbuf_size);
3.
4. void regfree(regex_t *preg);

   size_t regerror(int errcode, const regex_t *preg, char *errbuf,
                     size_t errbuf_size);

   void regfree(regex_t *preg);

根據(jù)以往的經(jīng)驗regerror相當(dāng)于perror或者strerror，將錯誤碼翻譯成一個可讀性好的字符
串，regfree相當(dāng)于free，用來釋放preg。但是preg不是我們自己事先分配的對象么？既然不
是由這一組函數(shù)動態(tài)分配的，為什么需要用這一組函數(shù)來free？由這個問題引出一個新的猜
測，regex_t這種結(jié)構(gòu)體中一定有指針類型的成員，regcomp函數(shù)一定是動態(tài)分配了一塊內(nèi)存
然后讓preg中的指針成員指向該內(nèi)存，所以需要用regfree來釋放一下，后者循著preg參數(shù)找
到它的指針成員，然后釋放先前分配的內(nèi)存。再結(jié)合經(jīng)驗，正則表達式的長短不同，復(fù)雜程
度肯定不同，如果用狀態(tài)機表示那么需要的狀態(tài)數(shù)量肯定不同，不可能所有正則表達式的二
進制表示都用sizeof(regex_t)這么大就夠用，必然需要動態(tài)分配內(nèi)存。這種推理和猜測不僅
有助于解決如何使用函數(shù)的問題，而且對于這些函數(shù)的實現(xiàn)機制也獲得了一些insight，這種
能力對于讀代碼尤其重要。注意，釋放內(nèi)存的函數(shù)雖然是傳入?yún)?shù)的，不傳出任何有意義的
值，但是函數(shù)原型中的參數(shù)不使用const修飾，因為釋放內(nèi)存也是一種修改。

剛把SYNOPSIS看完，還沒有看下面的說明，就已經(jīng)差不多會用這些函數(shù)了，靠的是什么？1、
推理 2、經(jīng)驗 3、對英文縮寫敏感。下面一邊看說明，一邊驗證以上猜測。

C代碼

1. DESCRIPTION
2. POSIX Regex Compiling
3.       regcomp()  is  used to compile a regular expression into a form that is
4.       suitable for subsequent regexec() searches.

DESCRIPTION
POSIX Regex Compiling
   regcomp()  is  used to compile a regular expression into a form that is
   suitable for subsequent regexec() searches.

沒錯，regcomp確實是用來把正則表達式轉(zhuǎn)換成一種二進制格式以適合subsequent的
regexec()處理。這個subsequent就說明先調(diào)用regcomp再調(diào)用regexec。理解文檔的時候，表
示概念的文字和表示概念之間關(guān)系的文字是最重要的。像man page這種簡潔的文檔中，表示
概念之間關(guān)系的文字尤其容易被忽視，因為不像下定義那么明顯，往往一個詞就帶過。作為
練習(xí)，請讀者注意后面的文字中有哪些表示概念之間關(guān)系的詞。

linux_Ultra · 發(fā)表于 2009-6-30 09:33:36

C代碼

1. regcomp() is supplied with preg, a pointer to a pattern buffer  storage
2. area;  regex, a pointer to the null-terminated string and cflags, flags
3. used to determine the type of compilation.
4.
5. All regular expression searching must be done via  a  compiled  pattern
6. buffer,  thus  regexec()  must always be supplied with the address of a
7. regcomp() initialized pattern buffer.

   regcomp() is supplied with preg, a pointer to a pattern buffer  storage
   area;  regex, a pointer to the null-terminated string and cflags, flags
   used to determine the type of compilation.

   All regular expression searching must be done via  a  compiled  pattern
   buffer,  thus  regexec()  must always be supplied with the address of a
   regcomp() initialized pattern buffer.

preg, a pointer to a pattern buffer storage area就說明preg這個對象的空間是需要我
們自己分配的，分配完了再傳一個地址也就是preg給regcomp。man page不會直接說你應(yīng)該自
己分配了空間再傳給我，這么說也太貳了。但你要自己體會出它真正想傳達給你的信息。

C代碼

1.    cflags may be the bitwise-or of one or more of the following:
2.
3.    REG_EXTENDED
4.          Use POSIX Extended Regular Expression syntax  when  interpreting
5.          regex. If  not  set,  POSIX Basic Regular Expression syntax is
6.          used.
7.
8.    REG_ICASE
9.          Do not differentiate case.  Subsequent regexec() searches  using
  10.          this pattern buffer will be case insensitive.
  11.
  12.    REG_NOSUB
  13.          Support  for  substring  addressing  of matches is not required.
  14.          The nmatch and pmatch parameters to regexec() are ignored if the
  15.          pattern buffer supplied was compiled with this flag set.
  16.
  17.    REG_NEWLINE
  18.          Match-any-character operators don’t match a newline.
  19.
  20.          A  non-matching list ([^...])  not containing a newline does not
  21.          match a newline.
  22.
  23.          Match-beginning-of-line operator (^) matches  the  empty  string
  24.          immediately  after  a newline, regardless of whether eflags, the
  25.          execution flags of regexec(), contains REG_NOTBOL.
  26.
  27.          Match-end-of-line operator ($) matches the empty string  immedi‐
  28.          ately  before  a  newline, regardless of whether eflags contains
  29.          REG_NOTEOL.
  30.
  31. POSIX Regex Matching
  32.    regexec() is used to match a null-terminated string against the precom‐
  33.    piled  pattern  buffer,  preg. nmatch  and pmatch are used to provide
  34.    information regarding the location of any matches.  eflags may  be  the
  35.    bitwise-or  of  one  or  both  of REG_NOTBOL and REG_NOTEOL which cause
  36.    changes in matching behavior described below.
  37.
  38.    REG_NOTBOL
  39.          The match-beginning-of-line operator always fails to match  (but
  40.          see  the  compilation  flag  REG_NEWLINE above) This flag may be
  41.          used when different portions of a string are passed to regexec()
  42.          and the beginning of the string should not be interpreted as the
  43.          beginning of the line.
  44.
  45.    REG_NOTEOL
  46.          The match-end-of-line operator always fails to  match  (but  see
  47.          the compilation flag REG_NEWLINE above)

   cflags may be the bitwise-or of one or more of the following:

   REG_EXTENDED
            Use POSIX Extended Regular Expression syntax  when  interpreting
            regex. If  not  set,  POSIX Basic Regular Expression syntax is
            used.

   REG_ICASE
            Do not differentiate case.  Subsequent regexec() searches  using
            this pattern buffer will be case insensitive.

   REG_NOSUB
            Support  for  substring  addressing  of matches is not required.
            The nmatch and pmatch parameters to regexec() are ignored if the
            pattern buffer supplied was compiled with this flag set.

   REG_NEWLINE
            Match-any-character operators don’t match a newline.

            A  non-matching list ([^...])  not containing a newline does not
            match a newline.

            Match-beginning-of-line operator (^) matches  the  empty  string
            immediately  after  a newline, regardless of whether eflags, the
            execution flags of regexec(), contains REG_NOTBOL.

            Match-end-of-line operator ($) matches the empty string  immedi‐
            ately  before  a  newline, regardless of whether eflags contains
            REG_NOTEOL.

POSIX Regex Matching
   regexec() is used to match a null-terminated string against the precom‐
   piled  pattern  buffer,  preg. nmatch  and pmatch are used to provide
   information regarding the location of any matches.  eflags may  be  the
   bitwise-or  of  one  or  both  of REG_NOTBOL and REG_NOTEOL which cause
   changes in matching behavior described below.

   REG_NOTBOL
            The match-beginning-of-line operator always fails to match  (but
            see  the  compilation  flag  REG_NEWLINE above) This flag may be
            used when different portions of a string are passed to regexec()
            and the beginning of the string should not be interpreted as the
            beginning of the line.

   REG_NOTEOL
            The match-end-of-line operator always fails to  match  (but  see
            the compilation flag REG_NEWLINE above)

前面猜測過了，cflags和eflags既然不叫同一個名字，肯定分別有不同的取值，并且通常這
些取值都是bitwise-or起來用的。本文重點在于講如何閱讀理解man page，而不在于講具體
的技術(shù)，所以這些標(biāo)志都起什么作用不詳細(xì)解釋了。但是再做幾個猜縮寫的練習(xí)，這不僅有
助于理解，更有助于記憶這些標(biāo)志，有些常用的標(biāo)志把它記住了就不必每次用都查手冊了。
REG_ICASE，ICASE表示ignore case，這種縮寫很常見。REG_NOSUB，SUB有些時候表示
substitute，有些時候表示substring，在這里就表示substring。REG_NOTBOL，初看不知道
BOL是什么，看是再看和它對稱的REG_NOTEOL，根據(jù)經(jīng)驗，我們已經(jīng)知道EOF是end of file，
那么這個EOL應(yīng)該是end of line，那么相對地BOL就應(yīng)該是beginning of line。

linux_Ultra · 發(fā)表于 2009-6-30 09:36:54

C代碼

1. BYTE OFFSETS
2.    Unless  REG_NOSUB was set for the compilation of the pattern buffer, it
3.    is possible to obtain substring match addressing  information. pmatch
4.    must be dimensioned to have at least nmatch elements.  These are filled
5.    in by regexec() with substring match addresses.  Any  unused  structure
6.    elements will contain the value -1.
7.
8.    The  regmatch_t  structure  which  is  the type of pmatch is defined in
9.    .
  10.
  11.       typedef struct {
  12.          regoff_t rm_so;
  13.          regoff_t rm_eo;
  14.       } regmatch_t;
  15.
  16.    Each rm_so element that is not -1 indicates the  start  offset  of  the
  17.    next  largest  substring  match  within the string.  The relative rm_eo
  18.    element indicates the end offset of the match.

BYTE OFFSETS
   Unless  REG_NOSUB was set for the compilation of the pattern buffer, it
   is possible to obtain substring match addressing  information. pmatch
   must be dimensioned to have at least nmatch elements.  These are filled
   in by regexec() with substring match addresses.  Any  unused  structure
   elements will contain the value -1.

   The  regmatch_t  structure  which  is  the type of pmatch is defined in
   .

         typedef struct {
            regoff_t rm_so;
            regoff_t rm_eo;
         } regmatch_t;

   Each rm_so element that is not -1 indicates the  start  offset  of  the
   next  largest  substring  match  within the string.  The relative rm_eo
   element indicates the end offset of the match.

沒錯，先前我們猜測，regmatch_t對象表示匹配的位置信息，從regexec函數(shù)返回后，那組
regmatch_t對象后面無效的部分一定是用一個特殊值來表示無效，這個特殊值就是-1。匹配
位置信息包括起始位置和結(jié)束位置，再一猜就知道，rm_so表示regmatch start
offset，rm_eo表示regmatch end offset，要有這樣的敏感性，rm_so和rm_eo，別的字母都
一樣，就s和e不一樣，表示相對概念的s和e就是start和end，這在程序代碼中很常見。還有
一個很常見的現(xiàn)象是結(jié)構(gòu)體成員名字有一個前綴是結(jié)構(gòu)體名字的縮寫，比如這里的rm_表示
regmatch。

C代碼

1. Posix Error Reporting
2.    regerror() is used to turn the error codes that can be returned by both
3.    regcomp() and regexec() into error message strings.
4.
5.    regerror() is passed the error code, errcode, the pattern buffer, preg,
6.    a pointer to a character string buffer, errbuf, and  the  size  of  the
7.    string buffer, errbuf_size.  It returns the size of the errbuf required
8.    to contain the null-terminated error message string. If  both  errbuf
9.    and  errbuf_size  are  nonzero,  errbuf  is  filled  in  with the first
  10.    errbuf_size - 1 characters of the error message and a terminating null.
  11.
  12. POSIX Pattern Buffer Freeing
  13.    Supplying  regfree()  with a precompiled pattern buffer, preg will free
  14.    the memory allocated to the pattern buffer by  the  compiling  process,
  15.    regcomp().

Posix Error Reporting
   regerror() is used to turn the error codes that can be returned by both
   regcomp() and regexec() into error message strings.

   regerror() is passed the error code, errcode, the pattern buffer, preg,
   a pointer to a character string buffer, errbuf, and  the  size  of  the
   string buffer, errbuf_size.  It returns the size of the errbuf required
   to contain the null-terminated error message string. If  both  errbuf
   and  errbuf_size  are  nonzero,  errbuf  is  filled  in  with the first
   errbuf_size - 1 characters of the error message and a terminating null.

POSIX Pattern Buffer Freeing
   Supplying  regfree()  with a precompiled pattern buffer, preg will free
   the memory allocated to the pattern buffer by  the  compiling  process,
   regcomp().

這也和先前猜測的一致。regerror是把錯誤碼翻譯成可讀性好的字符串。regfree是把preg對
象中分配的內(nèi)存釋放掉。

linux_Ultra · 發(fā)表于 2009-6-30 09:37:16

C代碼

1. RETURN VALUE
2.       regcomp()  returns  zero  for a successful compilation or an error code
3.       for failure.
4.
5.       regexec() returns zero for a successful match or REG_NOMATCH for  fail‐
6.       ure.

RETURN VALUE
   regcomp()  returns  zero  for a successful compilation or an error code
   for failure.

   regexec() returns zero for a successful match or REG_NOMATCH for  fail‐
   ure.

man page為了保持形式上的整齊，把RETURN VALUE單獨拿出來湊成一節(jié)，這一直讓我覺得很
不舒服。如果在一個man page里描述了多個函數(shù)，那么每看完一個函數(shù)的說明都應(yīng)該跳到這
里來看一下返回值是什么，而不是把其它函數(shù)的說明全部看完了再看這里。事實上這個man
page做得也不夠整齊，regerror的返回值就寫在上面的說明文字中而沒有寫在這里。可見把
返回值在最后單列出來很不符合書寫和閱讀習(xí)慣�，F(xiàn)在這樣搞得很不好，有的返回值單列在
后面，有的又寫在說明文字中，看手冊就得滿世界找返回值在哪兒。我認(rèn)為這是man page的
一大缺點。相反，讓新手很不舒服的是man page太過簡潔，并且沒有代碼例子，這不是man
page的缺點而應(yīng)該是優(yōu)點。

C代碼

1. ERRORS
2.       The following errors can be returned by regcomp():
3.
4.       REG_BADBR
5.             Invalid use of back reference operator.
6.
7.       REG_BADPAT
8.             Invalid use of pattern operators such as group or list.
9.
  10.       REG_BADRPT
  11.             Invalid  use  of  repetition  operators such as using ’*’ as the
  12.             first character.
  13.
  14.       REG_EBRACE
  15.             Un-matched brace interval operators.
  16.
  17.       REG_EBRACK
  18.             Un-matched bracket list operators.
  19.
  20.       REG_ECOLLATE
  21.             Invalid collating element.
  22.
  23.       REG_ECTYPE
  24.             Unknown character class name.
  25.
  26.       REG_EEND
  27.             Non specific error.  This is not defined by POSIX.2.
  28.
  29.       REG_EESCAPE
  30.             Trailing backslash.
  31.
  32.       REG_EPAREN
  33.             Un-matched parenthesis group operators.
  34.
  35.       REG_ERANGE
  36.             Invalid use of the range operator, e.g., the ending point of the
  37.             range occurs prior to the starting point.
  38.
  39.       REG_ESIZE
  40.             Compiled  regular  expression  requires  a pattern buffer larger
  41.             than 64Kb.  This is not defined by POSIX.2.
  42.
  43.       REG_ESPACE
  44.             The regex routines ran out of memory.
  45.
  46.       REG_ESUBREG
  47.             Invalid back reference to a subexpression.
  48.
  49. CONFORMING TO
  50.       POSIX.1-2001.

ERRORS
   The following errors can be returned by regcomp():

   REG_BADBR
            Invalid use of back reference operator.

   REG_BADPAT
            Invalid use of pattern operators such as group or list.

   REG_BADRPT
            Invalid  use  of  repetition  operators such as using ’*’ as the
            first character.

   REG_EBRACE
            Un-matched brace interval operators.

   REG_EBRACK
            Un-matched bracket list operators.

   REG_ECOLLATE
            Invalid collating element.

   REG_ECTYPE
            Unknown character class name.

   REG_EEND
            Non specific error.  This is not defined by POSIX.2.

   REG_EESCAPE
            Trailing backslash.

   REG_EPAREN
            Un-matched parenthesis group operators.

   REG_ERANGE
            Invalid use of the range operator, e.g., the ending point of the
            range occurs prior to the starting point.

   REG_ESIZE
            Compiled  regular  expression  requires  a pattern buffer larger
            than 64Kb.  This is not defined by POSIX.2.

   REG_ESPACE
            The regex routines ran out of memory.

   REG_ESUBREG
            Invalid back reference to a subexpression.

CONFORMING TO
   POSIX.1-2001.

有個學(xué)生看完了這一段之后問我，上面說regexec成功返回0失敗返回
REG_NOMATCH，REG_NOMATCH這個錯誤碼表示什么？怎么在ERRORS節(jié)中沒有解釋？這是一個典
型的沒有理解到位的例子。上面說regcomp成功返回0失敗返回錯誤碼，卻沒有說返回哪些錯
誤碼，而是詳細(xì)列在ERRORS節(jié)中，regcomp失敗的原因有很多，這些錯誤碼大多是描述正則表
達式的各種語法錯誤的。而regexec是判斷匹配不匹配的，匹配就返回0不匹配就返回
REG_NOMATCH，NOMATCH就是no match，這句話本身就說明了這個錯誤碼是什么意思，所以就
沒有在ERRORS節(jié)中再解釋了，這也體現(xiàn)了man page的簡潔性，一句廢話都沒有。

這個學(xué)生為什么會沒有理解到位呢？還是因為對英文不敏感，REG_NOMATCH在他看來就是一串
大寫字母，一個符號，而沒看出來是no match，因此覺得這個符號必須在后面詳細(xì)解釋，而
沒有想到這個符號用在這里是雙關(guān)的，它自己就解釋了自己。

C代碼

1. SEE ALSO
2.       grep(1), regex(7), GNU regex manual
3.
4. COLOPHON
5.       This page is part of release 2.77 of the Linux  man-pages  project. A
6.       description  of  the project, and information about reporting bugs, can
7.       be found at http://www.kernel.org/doc/man-pages/.
8.
9. GNU                            1998-05-08                         REGEX(3)

SEE ALSO
   grep(1), regex(7), GNU regex manual

COLOPHON
   This page is part of release 2.77 of the Linux  man-pages  project. A
   description  of  the project, and information about reporting bugs, can
   be found at http://www.kernel.org/doc/man-pages/.

GNU                            1998-05-08                         REGEX(3)

man page的最后這一段比較有價值的是SEE ALSO。由于每個man page都有自己的主題，而不
會去扯一些離題的話，有時候就需要把幾個相關(guān)的man page結(jié)合起來看，從一系列的相關(guān)主
題中把握一個overview。有的man page有BUGS節(jié)，這也是非常重要的，最典型的是gets(3)，
前面描述了半天這個函數(shù)是干嗎用的，最后在BUGS節(jié)里面說，Never use gets()，如
果沒看見這一句，前面的都白看。

宇宙飛船 · 發(fā)表于 2009-6-30 09:43:36

俺等會也搞點英語閱讀材料上來，也是關(guān)于GNU工具的，這些都是電工們吃飯的家當(dāng)。

qupeng2008 · 發(fā)表于 2009-6-30 09:51:10

幫頂啊~雖然俺看不懂~O(∩_∩)O~

linux_Ultra · 發(fā)表于 2009-6-30 09:52:54

寫帖子的人還 GNU Free Documentation License發(fā)布的 linux 編程書，
雖然有一定的商業(yè)目的，但是還可以看看的，
http://djkings.javaeye.com/blog/218542

国产毛片a精品毛-国产毛片黄片-国产毛片久久国产-国产毛片久久精品-青娱乐极品在线-青娱乐精品

如何看懂man page？（轉(zhuǎn)）