lingua-py语种检测入门指南

lingua-py语种检测入门指南
AllenTT安装
1 | pip install lingua-language-detector # 2.1.1 |
简单使用
1 | from lingua import Language, LanguageDetectorBuilder |
各语种检测示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15detector.detect_language_of('语言的魅力')
detector.detect_language_of("languages are awesome")
detector.detect_language_of('kontol maaf tai ga nanya') # Language.INDONESIAN
detector.detect_language_of('sahu randi aao na') # Language.MALAY
detector.detect_language_of('Magandang umaga') # Language.TAGALOG
detector.detect_language_of('कयों') # Language.HINDI
detector.detect_language_of('ยินดีที่ได้รู้จัก') # Language.THAI
detector.detect_language_of('Ông từ đâu đến?') # Language.VIETNAMESE
detector.detect_language_of('سّيد') # Language.ARABIC
detector.detect_language_of('İyi akşamlar.') # Language.TURKISH
detector.detect_language_of('😭🧖🧖..[];') # None检测错误的badcases
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15detector.detect_language_of('cwek sangean pap') # 检测结果是Language.ENGLISH,实际应该是Language.INDONESIAN
# 输出概率分布
confidence_values = detector.compute_language_confidence_values('cwek sangean pap')
for confidence in confidence_values[:3]:
print(f"{confidence.language.name}: {confidence.value:.2f}")
# ENGLISH: 0.31
# INDONESIAN: 0.29
# TAGALOG: 0.23
# 仅返回一种特定语言的置信度值
detector.compute_language_confidence("cwek sangean pap", Language.INDONESIAN) # 0.290841954706363




