命名实体识别

简介

命名实体识别 (NER)，也称为标记分类或文本标注，是一项将句子中的每个词（或“标记”）分类到不同类别的任务，例如人名、地名或不同词性。

例如，给定句子：

芝加哥有巴基斯坦餐厅吗？

命名实体识别算法可能会识别出：

“Chicago” 为地点
“Pakistani” 为民族

等等。

使用 gradio（特别是 HighlightedText 组件），您可以轻松构建 NER 模型的网页演示并与团队分享。

这是一个您可以构建的演示示例：

本教程将展示如何将预训练的 NER 模型部署到 Gradio 界面。我们将展示两种使用 HighlightedText 组件的不同方式——根据您的 NER 模型，这两种方式中的任何一种可能都更容易学习！

先决条件

请确保您已安装 gradio Python 包。您还需要一个预训练的命名实体识别模型。您可以使用自己的模型，而本教程中，我们将使用 transformers 库中的一个模型。

方法一：实体字典列表

许多命名实体识别模型输出一个字典列表。每个字典包含一个 *实体*、一个“开始”索引和一个“结束”索引。例如，transformers 库中的 NER 模型就是这样运行的。

from transformers import pipeline
ner_pipeline = pipeline("ner")
ner_pipeline("Does Chicago have any Pakistani restaurants")

输出

[{'entity': 'I-LOC',
  'score': 0.9988978,
  'index': 2,
  'word': 'Chicago',
  'start': 5,
  'end': 12},
 {'entity': 'I-MISC',
  'score': 0.9958592,
  'index': 5,
  'word': 'Pakistani',
  'start': 22,
  'end': 31}]

如果您有这样的模型，将其与 Gradio 的 HighlightedText 组件连接起来非常容易。您只需将此**实体列表**和**原始文本**一起作为字典传递给模型，其中键分别为 "entities" 和 "text"。

这是一个完整示例：

from transformers import pipeline

import gradio as gr

ner_pipeline = pipeline("ner")

examples = [
    "Does Chicago have any stores and does Joe live here?",
]

def ner(text):
    output = ner_pipeline(text)
    return {"text": text, "entities": output}

demo = gr.Interface(ner,
             gr.Textbox(placeholder="Enter sentence here..."),
             gr.HighlightedText(),
             examples=examples)

demo.launch()

方法二：元组列表

将数据传递到 HighlightedText 组件的另一种方法是使用元组列表。每个元组的第一个元素应该是被分类为特定实体的词或词组。第二个元素应该是实体标签（如果不需要标签，则为 None）。HighlightedText 组件会自动将词和标签组合起来以显示实体。

在某些情况下，这可能比第一种方法更容易。以下是一个使用 Spacy 词性标注器展示此方法的演示：

import gradio as gr
import os
os.system('python -m spacy download en_core_web_sm')
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

def text_analysis(text):
    doc = nlp(text)
    html = displacy.render(doc, style="dep", page=True)
    html = (
        "<div style='max-width:100%; max-height:360px; overflow:auto'>"
        + html
        + "</div>"
    )
    pos_count = {
        "char_count": len(text),
        "token_count": 0,
    }
    pos_tokens = []

    for token in doc:
        pos_tokens.extend([(token.text, token.pos_), (" ", None)])

    return pos_tokens, pos_count, html

demo = gr.Interface(
    text_analysis,
    gr.Textbox(placeholder="Enter sentence here..."),
    ["highlight", "json", "html"],
    examples=[
        ["What a beautiful morning for a walk!"],
        ["It was the best of times, it was the worst of times."],
    ],
)

demo.launch()

大功告成！这就是构建 NER 模型的网页 GUI 所需了解的全部内容。

小贴士：您只需在 launch() 中设置 share=True，即可立即与他人分享您的 NER 演示。