Hitzal is the name resulting from the combination of the Basque words hitz ("word") and itzal ("shadow").

This site is dedicated to Vicomtech's efforts to develop technology for automatic anonymisation of textual data. Currently, you will find content related to our participation in the 2019 edition MEDDOCAN: Medical Document Anonymization shared task and our medical anonimysation demo, HitzalMed. Upon registration, you can use the demo or download the scripts and models that our team used in the challenge, as well as check the documentation on how to use them yourself.

For information on how the models were trained and the results obtained in the task, please read our papers1, 2. If you use in a scientific publication any of the provided materials, please cite us appropriately:

    title = "Vicomtech at MEDDOCAN: Medical Document Anonymization",
    author = "Perez, Naiara and
      Garc\'ia-Sardi\~na, Laura and
      Serras, Manex and
      Del Pozo, Arantza",
    booktitle = "Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)",
    month = sep,
    year = "2019",
    address = "Bilbao, Spain",
    publisher = "CEUR Workshop Proceedings (CEUR-WS.org)",
    url = "http://ceur-ws.org/Vol-2421/MEDDOCAN_paper_8.pdf",
    volume  = "2421",
    pages = "696--703"
    title = "Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT",
    author = "Garc\'ia-Pablos, Aitor and
      Perez, Naiara and
      Cuadros, Montse",
    booktitle = "Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020)",
    month = may,
    year = "2010",
    address = "Marseille, France",
    publisher = "European Language Resources Association (ELRA)",
    url = "TBA",
    pages = "TBA"