Named Entity Recognition and Classification on English Medical Texts

The assignment is to identify and classify all the entities in a list of English documents. These entities are relevant terms, single or multiple words, representing semantically essential elements in a sentence in the medical domain. As shown in the figure below, a trained model was used to detect and classify the relevant entities in one of the examples from the test set.

NOTE: certain entities, such as “Pathogenic huntingtin deposits,” may contain more than one word. Additionally, as depicted in the previous image, a single span can be annotated multiple times for different categories. Furthermore, the spans may either be fully overlapped (with the same beginning and end points) or partially overlapped.

Each example comprises a single sentence stored in a .txt file. The annotations provided are saved in .ann files in BRAT format. The documents are named as textXXX, where XXX is a number used to identify the example. For every number, a .txt and a .ann file are available. You should expect to receive the BRAT annotation document for a given example, which includes the detected and classified annotations.

The objective is to create a model that automatically performs the described task.

Corpora and Taxonomy

For the eHealth-KD 2024 edition, we will be evaluating systems based on a corpus created for the T2KNOW research project (INNEST/2022/24). This corpus contains 1701 texts, which we later split into x sentences. These texts were selected based on specific Web of Science (WoS) filters to retrieve information on the following topics:

Regarding the taxonomy, it was chosen as base the list of the Semantic Types that UMLS includes. This list was refined, removing 34 categories out of 127, as they were outside the interest scope. The remaining 93 categories were grouped so that at the end the list just contains 40 types.

The final categories are:

Input and output format

Input files in the eHealth-KD 2024 are plain text files in UTF-8 format with one sentence per line. Sentences have not been preprocesed in any sense.

The output is always one file in BRAT standoff format, where each line represents either an entity or a relation. We provide Python scripts to read and write this format in the repository of the eHealth-KD 2024 Challenge. More details in the Resources section.

For example, the following sentence:

Pathogenic huntingtin deposits have been found in the olfactory bulb of individuals with HD, although no studies have correlated this with the grade of olfactory impairment.

should produce the following output:

T1 CellOrMolecularDysfunction 0 30 Pathogenic huntingtin deposits
T2 BiologicallyActiveSubstance 11 18 hunting
T3 AminoAcidPeptideOrProtein 11 30 huntingtin deposits
T4 BodyPartOrganOrOrganComponent 54 68 olfactory bulb
T5 PatientOrDisabledGroup 72 91 individuals with HD
T6 ResearchActivity 105 112 studies
T7 PathologicFunction 152 172 olfactory impairment

The order in which the entities appear in the output file is irrelevant. Feel free to simply use autoincremental identifiers, or use the provided Python scripts. The only requirement is that the output file must be in BRAT standoff format.

Challenge Scenario

The eHealth-KD 2024 Challenge proposes one evaluation scenario:

Main Evaluation

This scenario evaluates the detected and classified entities of a given text. The input consists of plain text, and the expected output will be annotations in BRAT format. Before introducing the formulas, an exact definition of what Correct, Missing, Spurious, Partial, and Incorrect is, is explained as follows:

The measures will be precision, recall, F1 and accuracy as follows:

\[Recall = \frac{C + \frac{1}{2} P}{C + I + P + M}\] \[Precision = \frac{C + \frac{1}{2} P}{C + I + P + S}\] \[F_1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}\] \[Accuracy = \frac{C + \frac{1}{2} P}{C + I + P + M + S}\]

For every category out of the 40 ones, all the correct, incorrect, partial, missing, and spurious are calculated. Then, for each category, the metrics are computed. Finally, an overall metric is assessed for all the 40 metrics.

F1 will determine the ranking of the eHealthKD challenge.

If in doubt please contact the organizers at ehealth-kd@googlegroups.com.