在MS-Windows平台下使用UTF-8编码

Windows平台下UTF-8编码是「二等公民」——唯一对UTF-8的支持可能是代码页65001 还会导致命令行工具出现问题

但这不能阻止UTF-8成为国际标准的编码 更不能阻止我们在Windows平台上使用这一编码

对于Emacs 在配置文件里面加入下面的语句可以让文件编码默认是UTF-8 而且结尾是UNIX风格的

1
(prefer-coding-system 'utf-8-unix)

但这会带来问题 因为这会使Emacs运行其他程序的命令行也参数也变成UTF-8编码(比如用M-&运行的程序) 下面的设置能纠正这一问题

1
2
3
4
(set-default 'process-coding-system-alist
'(("[pP][lL][iI][nN][kK]" gbk-dos . gbk-dos)
("[cC][mM][dD][pP][rR][oO][xX][yY]" gbk-dos . gbk-dos)
("[gG][sS]" gbk-dos . gbk-dos)))

Update:

关于Windows下使用UTF-8 Emacs的开发者Eli Zaretskii最近的一个值得一读

This works on the Windows XP shell (cmd.exe):

C:> chcp 65001
C:> grep λ *.txt
λfoo

(grep.exe is from MSYS2 and the .txt file encoding is utf-8).

IOW, this grep.exe is a Cygwin program (MSYS2 is a fork of Cygwin),
and thus supports UTF-8 locales, like Cygwin does.

However, M-x grep from Emacs displays no results (“Grep finished with no
matches found…”).

Yes, because the native Windows build of Emacs always encodes the
command-line arguments of programs it invokes using the current system
ANSI codepage. And the system codepage cannot be set to 65001,
because support for that codepage in Windows is half-hearted: for
example, it cannot be used as the codeset in the arguments to
’setlocale’ functions. Which means any native Windows program will be
unable to correctly handle UTF-8 encoded characters, since the data
used by C runtime functions that compare, collate, and otherwise
process characters will not behave as you expect.

Is there a method for executing shell commands encoded in utf-8 from
Emacs on Windows?

Sadly, no. Not until Windows improves their support for the UTF-8
codepage, such that it becomes a first-class citizen (and then we will
have a not-so-easy job of adapting to that in Emacs). Until then, if
you want UTF-8 communications with subprocesses, use the Cygwin build
of Emacs.

这段回复有以下几个要点:

  1. Cygwin程序可以在命令行使用UTF-8

  2. 原生的Windows程序总会把命令行参数用当前代码页编码 而系统的代码页不能设置成65001也就是不能用UTF-8编码 所以原生的Windows程序没法在命令行使用UTF-8

  3. 除非微软添加对UTF-8的支持 改变UTF-8是「二等公民」的现状 这种限制会一直存在