IberLEF eHealth Knowledge Discovery Challenge 2024

🗨️ Follow @EHealthKD on Twitter for up-to-date information.


Natural Language Processing (NLP) methods are increasingly being used to mine knowledge from unstructured health texts. Recent advances in health text processing techniques encourage researchers and health domain experts to go beyond just reading the information in published texts (e.g., academic manuscripts, clinical reports, etc.) and structured questionnaires to discover new knowledge by mining health content. This has allowed other perspectives that were not previously available to surface.

Over the years, many eHealth challenges have attempted to identify, classify, extract, and link knowledge, such as Semevals and CLEF campaigns.

The 2024 eHealthKD edition is focused on a single task - identifying and classifying elements in English health documents. This NERC problem allows generated annotations to be multiword, multiclass, and overlap with other annotations (Overlapping Named Entity NERC task). The objective is to produce an NLP model to perform the defined task, automatically generating annotations (entities) given a medical text.

The corpus consists of abstracts from particular papers obtained from Web of Science. The annotations were produced through automated means and then verified manually. The taxonomy was created by refining the Semantic Types of UMLS and selecting 40 types. Some were merged and refined from the 127 original types until the final list of 40 types was achieved. We present the complete list of these 40 types below.

Additional details about the corpus, the taxonomy, and the annotation process are provided at the end of this document. The evaluation process will be done simultaneously, measuring the detection and classification steps in the same formulas. An example of the semantic structure is provided in the following figure.

This challenge could be of great interest to experts in the field of natural language processing, particularly those who are working on automatic knowledge extraction and discovery. Furthermore, researchers in the eHealth domain could also benefit from this challenge by evaluating their technologies that rely on health domain knowledge.

The overall IberLEF workshop program can be found at the following link.

Description of the Task

The eHealth-KD 2024 challenge presents the following NERC task:

  1. Name Entity Recognition and Classification

Submissions and evaluation

There are four evaluation scenarios:

  1. A main scenario covering both tasks
  2. An optional scenario evaluating subtask A
  3. An optional scenario evaluating subtask B

📝 Details about the submission format will be provided shortly.

Resources

All the data will be made available to participants in due time. This includes training, development and test data, as well as evaluation scripts and sample submissions. More details are provided here.

All the currently available resources can be found in the eHealth-KD corpora repository.

Submission

🏆 Go to the Official Server

The challenge will be graded on kaggle.com. Check out the submission instructions for more details. There is also an ongoing training competition already hosted where you can try your system on the training dataset and development, to get acquainted with the submission workflow before trying the official server.

Schedule

Date Event Link
01 Feb 2024 🏋️ Training and Development data released 💾 Training set
🔧 Utility scripts
01 Mar 2024 ⚗️ Evaluation period begins – test data released  
15 Mar 2024 🤯 Evaluation and Registration periods end– due by 23:59 GMT-12 (AoE)  
22 Mar 2024 🏆 Results posted  
14 Apr 2024 🗞️ System descriptions due – closes by 23:59 GMT-12 (AoE)  
03 May 2024 📝 Papers reviews due  
10 May 2024 💌 Authors notifications  
14 Jun 2024 📸 Camera ready submissions due – closes by 23:59 GMT-12 (AoE)  

Publication instructions

📝 Official instructions and templates for the description paper will be provided shortly.

The Organization Committee of eHealth-KD encourages participants to submit a description paper of their systems. Submitted papers will be reviewed by a scientific committee, and only accepted papers will be published at CEUR. The proceedings of eHealth-KD will be jointly published with the proceedings of all tasks of IberLEF 2024. The submitted papers will be peer-reviewed by a Program Committee which is composed by all the participants and the Organization Committee.

Depending on the final number of participants and the time allocated for the workshop, all or a selected group of papers will be presented and discussed in the Workshop session.

Organization committee

Name Email Institution
Yoan Gutiérrez Vázquez ygutierrez@dlsi.ua.es University of Alicante, Spain
Andrés Montoyo Guijarro montoyo@dlsi.ua.es University of Alicante, Spain
Rafael Muñoz Guillena rafael@dlsi.ua.es University of Alicante, Spain
Estela Saquete Boro stela@dlsi.ua.es University of Alicante, Spain
Eduardo Grande Ruiz eduardo.grande@ua.es University of Alicante, Spain
Fabio Yañez Romero fabio.yanez@ua.es University of Alicante, Spain
Ernesto Luis Estevanell Valladares ernesto.estevanell@matcom.uh.cu University of Havana, Cuba
Suilan Estévez Velarde sestevez@matcom.uh.cu University of Havana, Cuba
Alejandro Piad Morffis apiad@matcom.uh.cu University of Havana, Cuba
Yudivián Almeida Cruz yudy@matcom.uh.cu University of Havana, Cuba

Discussion group

A Google Group will be set up for this “Health Shared Task” where announcements will be made. Feel free to send your questions and feedback to ehealth-kd2024@googlegroups.com. General issues and feedback should be posted on our Issues Page in Github.

Please, use the discussion group mentioned above. If some individual contact is needed, please contact Eduardo Grande.

Follow @eHealthKD on Twitter for up-to-date news, comments and tips about the competition.

Funding

This research has been supported by the University of Alicante, Generalitat Valenciana, Spanish Government, Ministerio de Educación, Cultura y Deporte and ERDF A way of making Europe, by the European Union or by the European Union NextGenerationEU/PRTR through the projects, Coolang (PID2021-122263OB-C22), T2KNOW (Innest/2022/24) and EATITALL (Innest/2023/10).