Emacs里用正则表达式匹配中文

2015-08-09 | Emacs小技巧 | 阅读

这是个非常好用的功能也是其他正则表达式里面不具备或不能优雅实现的(比如有些地方用[\u4e00-\u9fa5]来匹配中文你觉得你能记住么)

但就这个方便的功能我也记不住而且手册也看不大懂

正确写法：

Emacs里正则匹配中文的写法是\cc

手册里面是这么说的

‘\cC’
matches any character that belongs to the category C. For example,
‘\cc’ matches Chinese characters, ‘\cg’ matches Greek characters,
etc. For the description of the known categories, type ‘M-x
describe-categories RET’.

‘\CC’
matches any character that does not belong to category C.

15.7 Backslash in Regular Expressions

那把这个结果也贴出来罢反正这个函数我也记不住

Legend of category mnemonics (see the tail for the longer description)
:space for indent 9:semivowel lower R:Right-to-left … k:Katakana
.:Base <:Not at eol Y:2-byte Cyrillic l:Latin
0:consonant >:Not at bol ^:Combining o:Lao
1:base vowel A:2-byte alnum a:ASCII q:Tibetan
2:upper diacritic C:2-byte han b:Arabic r:Roman
3:lower diacritic G:2-byte Greek c:Chinese t:Thai
4:combining tone H:2-byte Hiragana e:Ethiopic v:Viet
5:symbol I:Indian Glyphs g:Greek w:Hebrew
6:digit K:2-byte Katakana h:Korean y:Cyrillic
7:vowel diacritic L:Left-to-right … i:Indian |:line breakable
8:vowel-signs N:2-byte Korean j:Japanese

M-x describe-categories

只有开头后面那些太乱了放上也没人会看

Donate

或者您可以把评论发在别处，添加指向本页的连接，然后把网址告诉我：

本文标题：Emacs里用正则表达式匹配中文

文章作者：Chris

发布时间：2015-08-09

最后更新：2022-03-23

原始链接：https://chriszheng.science/2015/08/09/Emacs-regular-express-match-Chinese/