1. 中文汉字Unicode 编码表
序号字符集字数Unicode 编码范围1基本汉字209024E00-9FA52基本汉字补充749FA6-9FEF3扩展A65823400-4DB54扩展B4271120000-2A6D65扩展C41492A700-2B7346扩展D2222B740-2B81D7扩展E57622B820-2CEA18扩展F74732CEB0-2EBE09康熙部首2142F00-2FD510部首扩展1152E80-2EF311兼容汉字477F900-FAD912兼容扩展5422F800-2FA1D13PUA(GBK)部件81E815-E86F14部件扩展452E400-E5E815PUA增补207E600-E6CF16汉字笔画3631C0-31E317汉字结构122FF0-2FFB18汉语注音433105-312F19注音扩展2231A0-31BA20〇13007
2. Python代码实现
#只要是检测到一个非汉字字符就返回
#if条件一大堆,肯定有更简单的写法,再学吧!
def is_ch(word):
for ch in word:
if not('\u4e00' <= ch <= '\u9fef') and not ('\u3400' <= ch <= '\u4db5') \
and not ('\u20000' <= ch <= '\u2a6d6') and not ('\u2a700' <= ch <= '\u2b734')\
and not ('\u2b740' <= ch <= '\u2b81d') and not ('\u2b820' <= ch <= '\u2cea1')\
and not ('\u2ceb0' <= ch <= '\u2ebe0') and not ('\u2f00' <= ch <= '\u2fd5')\
and not ('\u2e80' <= ch <= '\u2ef3') and not ('\uf900' <= ch <= '\ufad9')\
and not ('\u2f800' <= ch <= '\u2fa1d') and not ('\ue815' <= ch <= '\ue86f')\
and not ('\ue400' <= ch <= '\ue5e8') and not ('\ue600' <= ch <= '\ue6cf')\
and not ('\u31c0' <= ch <= '\u31e3') and not ('\u2ff0' <= ch <= '\u2ffb')\
and not ('\u3105' <= ch <= '\u312f') and not ('\u31a0' <= ch <= '\u31ba'):
return False
break
return True
3. 有时间时可以扩展
(1)比如:全部为汉字时返回True和原字符串,有非汉字时返回False和非汉字字符串。
(2)if中判断条件一大堆,肯定有简单的写法,找到一个简单的写法或是优雅点的写法。
(3)更简单的实现方法?这些Unicode 编码连续吗?找时间研究一下!
版权归原作者 yxlint 所有, 如有侵权,请联系我们删除。