2rd International Workshop on PROfiling LINGuistic KNOWledgE gRaphs

co-located with LDK 2023


In the last decades, we have experienced a substantial increase of Knowledge Graphs (KGs) published on the Web. The focus of this workshop is to reveal novel approaches, methodologies and frameworks on profiling Linguistic Linked Data (LLD) (corpora, lexicons, ontologies, etc.) as well as to highlight tools and user interfaces that can effectively assist different use cases for profiling such data. In addition, the workshop seeks methodologies that help effective profiling in building real-world Linked Data applications leveraging linguistic data, as well as use cases that reveal success stories or aspects that have been neglected so far. The benefits of addressing Linguistic Linked Data profiling issues will not only help in understanding and exploring such data, but also provide the means to increase Linguistic Linked Data consumption, and to maintain track of the evolution of the relevant datasets.

Despite the high number of datasets published as LLD, their usage is still not exploited as they lack comprehensive metadata. Data consumers need to obtain information about datasets in a concise form to decide if they are useful for their use case or not. Data profiling techniques offer an efficient solution to this problem as they are used to generate a semantic profile that contains metadata and statistics that describe the content of the dataset. Semantic profiles are very important for different use cases, such as: (1) provision of a general overview of the data, (2) ontology / dataset integration, (3) identification of quality issues, (4) query optimization, (5) data visualization, (6) data analytics tasks, (7) schema discovery, and (8) entity summarization. 

Besides academia, the workshop targets developers and other knowledge workers. We envision the workshop as a forum for researchers and practitioners to come together and discuss common challenges and identify synergies for joint initiatives. We welcome contributions describing technical approaches, as well as those related to real use cases in using semantic profiles.

To assure a high quality of the accepted papers, a peer review process is chosen for the workshop. Each submission will be reviewed by at least 2 members of the PC. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. 


The proposed workshop seeks application-oriented papers, as well as more theoretical papers and position papers. The workshop proposes a multidisciplinary discussion on the following themes, with a focus on RDF data. Main topics but not limited to:


09:00 - 09:10   Opening 

09:10 - 10:00   Keynote by Marieke Van Erp on Contextual Profiling of Linguistic Datasets.  

10:00 - 10:30   RDF Shapes ecosystem: tooling and uses: Daniel Fernandez 

10:30 - 11:00   Profiling Linguistic Knowledge Graphs: Blerina Spahiu, Renzo Alva Principe and Andrea Maurino

11:00 - 11:30   Coffee Break

11:30 - 12:00   Pruning and re-ranking the frequent patterns in knowledge graph profiling using machine learning: Gollam Rabby, Farhana Keya, Vojtěch Svátek and Blerina Spahiu

12:00 - 13:00  Discussion & Closing

13:00 - 14:00  Lunch Break


We welcome the following types of contributions:

All submission lengths are given including references. Accepted submissions will be published in an open-access conference proceedings volume, free of charge for authors. The ACL templates should therefore be used for all conference submissions.

Papers have to be submitted through easychair:

Each submission will be reviewed by at least 2 members of the PC. Papers will be evaluated according to their significance, originality, technical content, style, clarity and relevance to the workshop.



Keynote speaker for the ProLingKNOWER 2023 is Marieke van Erp!

We're thrilled to announce that Marieke van Erp will be our keynote speaker for the ProLingKNOWER workshop! As an esteemed expert in her field, she'll bring invaluable insights into cutting-edge research and industry trends that regards language and semantic web technologies.

Title of Marieke's talk: Contextual Profiling of Linguistic Datasets 

Improving the metadata of datasets has received more attention in recent years with initiatives such as Datasheets for Datasets and DCAT these initiatives mostly focus on the form and creation process of the data and to a certain extent the topics and themes. In this talk, I will make a case for contextual profiling of datasets, as the context in which a dataset was conceived and/or used can have far reaching implications for its interpretation. Through examples from the humanities domain, I will show how the meaning of terms is affected by situational factors and how we can describe such contexts to prevent misinterpretations when the dataset is used outside its original frame of reference. 

Guest speaker for the ProLingKNOWER 2023 Daniel Fernandez!

We're thrilled to announce that Daniel Fernandez will be our guest speaker for the ProLingKNOWER workshop! He has a PhD in Computer Science and he is an Associate Professor at the University of Oviedo, Spain. He is specialized in RDF shapes and, specifically, automatic extraction of RDF shapes from knowledge graphs/natural language

Title of Daniel's talk: RDF Shapes ecosystem: tooling and uses

In this talk, we will describe the purpose and potential uses of RDF shapes (SHACL and ShEx). We will start by briefly introducing the concept of shape and discussing some differences between shapes and other technologies used to validate or describe RDF data. Then, we will make an overview of tools that allow users to perform usual tasks with RDF shapes, such as editing and validation. Finally, as hand-crafting shapes is costly, we will describe techniques and tools for automatically extracting or infering shapes from existing RDF content.

Acknowledgements for the abstract:

The research work presented in this talk was partially funded by the Spanish Ministry of Economy and Industry, project ID MCI-21-PID2020-117912RB-C21.



Blerina Spahiu

University of Milano-Bicocca



Vojtèch Svatek

Prague University of Economics and Business 

Czech Republic


Maribel Acosta

Ruhr-Universität Bochum



Penny Labropoulou

Institute for Language and Speech Processing/R.C. “Athena”



Milan Dojchinovski

Czech Technical University in Prague, Czech Republic

Research Associate at InfAI/Leipzig University, Germany




The first edition of the ProLingKNOWER workshop was held on 23rd May 2022  in Jerusalem, Israel.

This workshop is supported by NexusLinguarum Cost Action CA18209