Emacs的字符类(Character Classes)

今天折腾git丢失了三篇帖子,一个是庄子里「原宪甘贫」的故事,一个是这个,再一个是我折腾博客的事。最后一个写的也简单就不补了,把这篇凭记忆写出来。

Emacs的正则里面支持字符类(Character Classes),就是类似[:ascii:]的东西。

字符类的正确用法是[[:ascii:]]。如果不写外面的[]就变成了[:ascii:],也就是:asci之中的任意一个字符,失去了本意。

我之前一直搞错了,后来才发现了自己的错误,写出来希望大家不要犯错。

2016-Oct-22 Update: 也把所有的字符类复制过来。

  • ‘[:ascii:]’
    This matches any ASCII character (codes 0–127).

  • ‘[:alnum:]’
    This matches any letter or digit. For multibyte characters, it
    matches characters whose Unicode ‘general-category’ property (*note
    Character Properties::) indicates they are alphabetic or decimal
    number characters.

  • ‘[:alpha:]’
    This matches any letter. For multibyte characters, it matches
    characters whose Unicode ‘general-category’ property (*note
    Character Properties::) indicates they are alphabetic characters.

  • ‘[:blank:]’
    This matches space and tab only.

  • ‘[:cntrl:]’
    This matches any ASCII control character.

  • ‘[:digit:]’
    This matches ‘0’ through ‘9’. Thus, ‘[-+[:digit:]]’ matches any
    digit, as well as ‘+’ and ‘-’.

  • ‘[:graph:]’
    This matches graphic characters—everything except whitespace, ASCII
    and non-ASCII control characters, surrogates, and codepoints
    unassigned by Unicode, as indicated by the Unicode
    ‘general-category’ property (*note Character Properties::).

  • ‘[:lower:]’
    This matches any lower-case letter, as determined by the current
    case table (*note Case Tables::). If ‘case-fold-search’ is
    non-‘nil’, this also matches any upper-case letter.

  • ‘[:multibyte:]’
    This matches any multibyte character (*note Text
    Representations::).

  • ‘[:nonascii:]’
    This matches any non-ASCII character.

  • ‘[:print:]’
    This matches any printing character—either whitespace, or a graphic
    character matched by ‘[:graph:]’.

  • ‘[:punct:]’
    This matches any punctuation character. (At present, for multibyte
    characters, it matches anything that has non-word syntax.)

  • ‘[:space:]’
    This matches any character that has whitespace syntax (*note Syntax
    Class Table::).

  • ‘[:unibyte:]’
    This matches any unibyte character (*note Text Representations::).

  • ‘[:upper:]’
    This matches any upper-case letter, as determined by the current
    case table (*note Case Tables::). If ‘case-fold-search’ is
    non-‘nil’, this also matches any lower-case letter.

  • ‘[:word:]’
    This matches any character that has word syntax (*note Syntax Class
    Table::).

  • ‘[:xdigit:]’
    This matches the hexadecimal digits: ‘0’ through ‘9’, ‘a’ through
    ‘f’ and ‘A’ through ‘F’.