0


Python实现检测字符串是否全为汉字(含生僻字)

1. 中文汉字Unicode 编码表

序号字符集字数Unicode 编码范围1基本汉字209024E00-9FA52基本汉字补充749FA6-9FEF3扩展A65823400-4DB54扩展B4271120000-2A6D65扩展C41492A700-2B7346扩展D2222B740-2B81D7扩展E57622B820-2CEA18扩展F74732CEB0-2EBE09康熙部首2142F00-2FD510部首扩展1152E80-2EF311兼容汉字477F900-FAD912兼容扩展5422F800-2FA1D13PUA(GBK)部件81E815-E86F14部件扩展452E400-E5E815PUA增补207E600-E6CF16汉字笔画3631C0-31E317汉字结构122FF0-2FFB18汉语注音433105-312F19注音扩展2231A0-31BA20〇13007

2. Python代码实现

#只要是检测到一个非汉字字符就返回
#if条件一大堆,肯定有更简单的写法,再学吧!
def is_ch(word):
    for ch in word:
        if not('\u4e00' <= ch <= '\u9fef') and not ('\u3400' <= ch <= '\u4db5') \
                and not ('\u20000' <= ch <= '\u2a6d6') and not ('\u2a700' <= ch <= '\u2b734')\
                and not ('\u2b740' <= ch <= '\u2b81d') and not ('\u2b820' <= ch <= '\u2cea1')\
                and not ('\u2ceb0' <= ch <= '\u2ebe0') and not ('\u2f00' <= ch <= '\u2fd5')\
                and not ('\u2e80' <= ch <= '\u2ef3') and not ('\uf900' <= ch <= '\ufad9')\
                and not ('\u2f800' <= ch <= '\u2fa1d') and not ('\ue815' <= ch <= '\ue86f')\
                and not ('\ue400' <= ch <= '\ue5e8') and not ('\ue600' <= ch <= '\ue6cf')\
                and not ('\u31c0' <= ch <= '\u31e3') and not ('\u2ff0' <= ch <= '\u2ffb')\
                and not ('\u3105' <= ch <= '\u312f') and not ('\u31a0' <= ch <= '\u31ba'):
            return False
            break
    return True

3. 有时间时可以扩展

(1)比如:全部为汉字时返回True和原字符串,有非汉字时返回False和非汉字字符串。

(2)if中判断条件一大堆,肯定有简单的写法,找到一个简单的写法或是优雅点的写法。

(3)更简单的实现方法?这些Unicode 编码连续吗?找时间研究一下!

标签: python

本文转载自: https://blog.csdn.net/yxlint/article/details/122271586
版权归原作者 yxlint 所有, 如有侵权,请联系我们删除。

“Python实现检测字符串是否全为汉字(含生僻字)”的评论:

还没有评论