# Project Group Theoretical Computer Science

### Course Description

In this course the (E)MCL students will conduct a small scientific project that is close to current research interests to members of the chair of automata theory. Students will work under the supervision of a tutor, who will give pointers to the relavant literature, discuss the recommende articles with the student and will assist the student in developing an approach to address the research problem. The outcome of the project will be a report and a talk.

### Organisation

Initial meeting takes place on **April 18th** at **11:10** (3. DS) in Room **APB/3027**, where different topics will be proposed to the students. Students can chose from the offered topics, one to work on. Attending the initial meeting is highly recommended for starting a project this semester. People who want to participate in the project group, but have serious reasons to not attend the initial meeting, please contact Anni-Yasmin Turhan before May 1.

Students interested in doing their project, but unable to attend the initial meeting should contact Anni-Yasmin Turhan to discuss possible options.

Each student is assigned a tutor, depending on the topic chosen. During the semester, there will be regular meetings of the student and his tutor.

The results of the praktikum/project will be presented at the end of the semester in a talk given by the student.

### Position in curriculum

module MCL-P (-/-/4), 12 credits (LP).

### Prerequisites

Knowledge about the basic lecture regarding the chosen topic is strongly advised.

### Participants Duties

The participants are expected to read the relevant research papers picked by he tutor and to discuss it with their tutor in order to become acquainted with the topic chosen. The required implementation work (if any) should be carried out in a structured way, and has to be documented appropriately. If a topic is shared by two or more participants, acquiring team-working skills is another goal of the project. The results of the project have to be described in a project paper (~15 pages) and presented in a 30 minutes talk at the end of the semester.

It is also the duty of the participants to reserve enough time for conducting the project. The *sharp deadline* for finishing the project is the beginning of the following semester, i.e. the allowed time for the project is one semester plus the following semester break. It is the obligation of the participant to start the project in time, and to make appointments with the tutor for regular meetings during the semester.

### Topics

When choosing a topic, please take into account the knowledge you have already acquired. For example, if you'd like to do a project concerning knowledge representation, you are expected to have successfully attended the lecture "Logic-based knowledge representation" before starting the project.

Below you can find a list of possible topics for your project. Usually, these are closely related to our current research interests. We always welcome your own ideas, so please contact us if you have a concrete idea in mind.

It is well-known that regular languages are recognised by finite automata and are closed under union, intersection, complement, and also language operations, such as concatenation and quotient. However, some operations, like complement or intersection of linearly many automata, require exponentially larger automata.

Tree-like automata were considered in [1] in order to represent languages of the form \(K\cup L \Sigma^*\) for finite languages \(K,L\) and they demonstrate better closure properties. The aim of this project is to develop the theory of tree-like automata: after formally introduce them, prove their closure properties (or, rather, of the languages they accept) providing suitable constructions, and compute the complexity of such constructions. Furthermore, the study can consider extensions and limitations of tree-like automata.

This project requires basic knowledge of automata theory.

*Literature:*

- F Baader, R Küsters, A Borgida, DL McGuinness; Matching in description logics, Journal of Logic and Computation, Volume 9, Issue 3, 1 June 1999, Pages 411–447, https://doi.org/10.1093/logcom/9.3.411

*Tutor:* Pavlos Marantidis

For lightweight description logics such as \(\mathcal{EL}\), which is often used to formulate medical ontologies, subsumption algorithms additionally compute a so-called canonical model, which satisfies exactly those subsumption relationships that follow from the ontology. Such a model can be viewed as a graph, in which nodes and edges are labeled by concepts and relations. The goal of this project is to visualize properties of this model. The student should 1) transform the internal representation of the canonical model from \(\mathcal{EL}\)-reasoners (e.g., ELK https://www.cs.ox.ac.uk/isg/tools/ELK/) into a form suitable for visualization; 2) display the canonical model, even for large ontologies with more than 300,000 concepts; 3) provide functions to visualize properties of the model (e.g., color all elements that belong to concept \(C\)).

This project will be co-supervised by the Chair of Multimedia-Technology.

*Tutor*: Stefan Borgwardt

An important operation for ontologies is to compute the subsumption hierarchy, i.e., all subconcept-superconcept relationships between the concepts defined by the ontology. The goal of this project is to develop suitable visualisation approaches for this hierarchy. The student should 1) transform the internal representation of the subsumption hierarchy (http://owlcs.github.io/owlapi/) into a form suitable for visualization; 2) display the subsumption graph, even for large ontologies with more than 300,000 concepts; 3) provide functions to zoom into the neighborhood of a concept and to mark all sub- or superconcepts.

This project will be co-supervised by the Chair of Multimedia-Technology.

*Tutor*: Stefan Borgwardt

Ontologies are often used for integrated querying of heterogeneous data sources, such as classical databases or RDF triple stores. Query rewriting is a key technique for efficient query answering. The aim of the project is to adapt an existing query rewriting algorithm for Datalog+/- ontologies [1] to the case of so-called multi-linear Datalog+/- rules, prove its correctness, and consider the impact of standard optimization techniques.

The project requires basic knowledge of first-order logic and proof techniques.

*Literature*:

- Georg Gottlob, Giorgio Orsi, and Andreas Pieris. Query rewriting and optimization for ontological databases. ACM Transactions on Database Systems, 39(3), pp. 25:1–25:46, 2014.

*Tutor*: Stefan Borgwardt

Ontologies formulated in description logics are used to define terminological knowledge in a range of areas such as medicine, biology, the semantic web, and others. Modern ontologies often cover large vocabularies, such as the SNOMED CT ontology used in many national healthcare services, which defines over 300,000 concepts.

Maintaining and understanding these ontologies is hindered by their complexity, especially since a lot of information is implicit. A technique to help with this is uniform interpolation [2]. Given a vocabulary of concepts and relations, uniform interpolation computes a restricted view of the ontology, that uses only terms from this vocabulary. As such, it allows to get insights about relations between concepts of interest, which helps in understanding and debugging of complex ontologies.

While there are implemented systems for uniform interpolation, they are currently not integrated into common ontology editors. The goal of this project is to solve this issue, by developing a plugin for the popular ontology editor Protégé, using the uniform interpolation library LETHE [1]. The plugin will allow ontology users and developers to select a vocabulary, and then show a view of the uniform interpolant, which has to be represented in a convenient way.

The project requires experience in Java programming as well as some foundational knowledge on description logics.

*Literature*:

- Patrick Koopmann, Renate A. Schmidt. LETHE: Saturation-based reasoning for non-standard reasoning tasks. In Proceedings of the 4th OWL Reasoner Evaluation Workshop (ORE), pp. 23–30, 2015.
- Patrick Koopmann, Renate A. Schmidt: Forgetting concept and role symbols in ALCH-ontologies. In Proceedings of the 19th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), pp. 552–567, 2013.

*Tutor*: Patrick Koopmann

The description logic \(\mathcal{EL}\) is used to express biomedical knowledge in many important Semantic Web ontologies, such as SNOMED CT or the Gene Ontology. Standard and non-standard reasoning tasks can help with building and maintaining these ontologies. For example, computing the difference between two concepts is useful in many ways: it can explain the concept hierarchy of the ontology ("What makes A a subconcept of B?") or refine answers to relaxed queries ("Under which assumptions do the instances of B become instances of A?"). However, existing definitions of concept difference operators [1] do not adequately deal with the structure of \(\mathcal{EL}\) concepts.

The goal of this project is to investigate a new definition for computing the concept difference, and determine its properties, such as existence, uniqueness, and computational complexity.

This project requires basic knowledge of logic-based knowledge representation.

*Literature*:

- Gunnar Teege. Making the difference: A subtraction operation for description logics. In Proceedings of the 4th International Conference on Principles of Knowledge Representation and Reasoning (KR), pp. 540–550, 1994.

*Tutor*: Oliver Fernández Gil

The light-weight description logic \(\mathcal{EL}\) has gained serious attraction for knowledge representation and reasoning, mainly due to its tractable reasoning problems and its easy readability. For instance, the Web Ontology Language (OWL) includes the profile "EL", which is based on \(\mathcal{EL}\). In this logic, the common reasoning problems still have polynomial time complexity, e.g., the problem of subsumption between two concept descriptions with respect to a TBox.

In the past, several non-standard reasoning tasks in \(\mathcal{EL}\) have been defined and their complexities have been investigated. The goal of this project is to consider a new non-standard reasoning task, namely that of the neighborhood of \(\mathcal{EL}\) concept descriptions. In particular, we say that an \(\mathcal{EL}\) concept description \(C\) is a lower neighbor of an \(\mathcal{EL}\) concept description \(D\) with respect to an \(\mathcal{EL}\) TBox \(\mathcal{T}\) if \(C\) is strictly subsumed by \(D\) w.r.t. \(\mathcal{T}\) and furthermore "there is no \(\mathcal{EL}\) concept description in between", i.e., each \(\mathcal{EL}\) concept description \(E\) with \(\mathcal{T}\models C\sqsubseteq E\sqsubseteq D\) is either equivalent to \(C\) or equivalent to \(D\). This topic can be investigated in various directions:

- \(\mathcal{T}\) can be empty.
- Determining the complexity of deciding neighborhood.
- Design and implementation of an efficient procedure for deciding neighborhood.
- Design and implementation of an efficient procedure for computing all upper/lower neighbors.
- More expressive description logics, e.g., \(\mathcal{ALC}\), or \(\mathcal{EL}\) with greatest fixed-point semantics.
- etc.

This project requires basic knowledge about description logics.

*Literature*:

- Franz Baader, Ian Horrocks, Carsten Lutz, Ulrike Sattler. An Introduction to Description Logic. Cambridge University Press, 2017.
- Francesco M. Donini, Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio: A Tableaux-Based Method for Computing Least Common Subsumers for Expressive Description Logics. In Proceedings of the 21st International Joint Conferences on Artificial Intelligence (IJCAI), pp. 739-745, 2009.

*Tutor*: Francesco Kriegel

Formal Concept Analysis (FCA) was developed as an attempt to formalize the philosophical notion of a “concept”. In particular, it has a strong correspondence to propositional logic, and provides methods for handling implicational knowledge. As a famous example, the canonical base of a given formal context \(\mathbb{K}\) is a minimal implication set which is sound and complete for the implicational knowledge of \(\mathbb{K}\). For the entailment relation between implication sets, both a semantic and several equivalent syntactic characterizations exist.

Recently in [2], a probabilistic extension of FCA was investigated, and a technique for the axiomatization of implications over probabilistic attributes was introduced. However, the implicational entailment was only described semantically, and no sound and complete syntactic deduction method exists. The goal of this project is to find such a syntactic characterization, and to provide an appropriate implementation. Since the logic used in [2] can be embedded in the logic from [1], it may be helpful to consider [1] for this task.

*Literature*:

- Ronald Fagin, Joseph Y. Halpern, Nimrod Megiddo. A logic for reasoning about probabilities. Information and Computation 87(1/2), pp. 78–128, 1990.
- Francesco Kriegel. Implications over probabilistic attributes. In Proceedings of the 14th International Conference on Formal Concept Analysis (ICFCA), pp. 168–183, 2017.

*Tutor*: Francesco Kriegel

The light-weight description logic \(\mathcal{EL}\) has gained serious attraction for knowledge representation and reasoning, mainly due to its tractable reasoning problems and its easy readability. For instance, the Web Ontology Language (OWL) includes the profile "EL", which is based on \(\mathcal{EL}\). In this logic, the common reasoning problems still have polynomial time complexity, e.g., the problem of subsumption between two concept descriptions with respect to a TBox.

Knowledge bases can not only be built manually, but there are also means for an automatic construction. In particular, in [1] it is described how the theory of concept inclusions valid in a given interpretation can be automatically axiomatized in a sound and complete manner. A weakness of the approach is that no existing knowledge in form of a TBox can be incorporated in the construction. Later, [2] provided an appropriate extension of the axiomatization procedure which took both an interpretation and a TBox as input. This construction makes heavy use of so-called model-based most specific concept descriptions relative to a TBox. So far, it was not investigated how to check their existence, and how to compute them effectively; both questions shall be addressed in this project. Ideas from [3] are helpful for answering these questions.

This project requires basic knowledge about description logics.

*Literature*:

- Pages 60–68 in: Felix Distel. Learning description logic knowledge bases from data using methods from formal concept analysis. PhD thesis, Technische Universität Dresden, 2011.
- Francesco Kriegel. Incremental learning of TBoxes from interpretation sequences with methods of Formal Concept Analysis. In Proceedings of the 28th International Workshop on Description Logics (DL), pp. 452–464, 2015.
- Benjamin Zarrieß, Anni-Yasmin Turhan. Most specific generalizations w.r.t. general \(\mathcal{EL}\)-TBoxes. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pp. 1191–1197, 2013.

*Tutor*: Francesco Kriegel

Formal Concept Analysis (FCA) was developed as an attempt to formalize the philosophical notion of a “concept”. In particular, it has a strong correspondence to propositional logic, and provides methods for handling implicational knowledge. As a famous example, the canonical base of a given formal context \(\mathbb{K}\) is a minimal implication set which is sound and complete for the implicational knowledge of \(\mathbb{K}\). If the context describing the domain of interest is not complete, but experts for this domain are available, then a technique called “Attribute Exploration” [1], can be used to construct, in interaction with the experts, a minimal set of implications that describes the considered domain in a sound and complete manner.

A disadvantage of the existing variants of Attribute Exploration is that in each step only one question can be posed to the experts, and the algorithm has to wait for the answer before it can continue; this renders the method impractical for larger domains. In [2], a variation was introduced that allows for parallel interaction with the experts. The goal of this project is to design and implement a platform for collaborative knowledge acquisition, which utilizes the parallel Attribute Exploration at its core. The implementation shall have a client-server architecture, and an appropriate use case shall be sought and tested.

*Literature*:

- Bernhard Ganter. Two basic algorithms in concept analysis. In Proceedings of the 8th International Conference on Formal Concept Analysis (ICFCA), pp. 312–340, 2010.
- Francesco Kriegel. Parallel attribute exploration. In Proceedings of the 22nd International Conference on Conceptual Structures (ICCS), pp. 91–106, 2016.

*Tutor*: Francesco Kriegel

For classical Descritpion Logics (DLs) it is often insufficient to reason with monotone semantics, especially when aspects of human cognition such as belief revision are to be modelled. A recent approach to extend DLs by nonmonotonic semantics is using so-called typicality models [1], which extend classical canonical models for the description logic \(\mathcal{EL}_{\bot}\). The goal of this project is to extend the completion procedure that is used to construct classical canonical models in \(\mathcal{EL}_{\bot}\) [2], in order to allow for defeasible knowledge bases and construct minimal and maximal typicality models.

This project requires basic knowledge about description logics.

*Literature*:

- Maximilian Pensel, Anni-Yasmin Turhan. Including quantification in defeasible reasoning for the description logic \(\mathcal{EL}_{\bot}\). In Proceedings of the 14th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR), pp. 78–84, 2017.
- Franz Baader, Sebastian Brandt, Carsten Lutz. Pushing the \(\mathcal{EL}\) envelope. In Proceedings of the 19th International Joint Conferences on Artificial Intelligence (IJCAI), pp. 364–369, 2005.

*Tutor*: Maximilian Pensel

In case where data is noisy and can contain errors, knowledge bases can easily become inconsistent. In such settings inconsistency-tolerant semantics for reasoning can be employed. If data is collected over a time span, the resulting data is often time-stamped admitting queries that refer to different points in time over a temporal knowledge base.

In this project the temporal instance queries (i.e. queries that retrieve instances of a concept) over a temporal knowledge base shall be investigated in combination with inconsistency-tolerant semantics. More precisely, instance queries combined with LTL (linear temporal logic) operators shall be investigated in regard of computational complexity and/or an algorithm for one of the simpler settings shall be devised and shown to be correct (and perhaps even implemented).

This project requires basic knowledge about description logics.

*Literature*:

- Rafael Penaloza. Inconsistency-Tolerant Instance Checking in Tractable Description Logics. International Joint Conference on Rules and Reasoning, 2017.
- Camille Bourgaux and Anni-Yasmin Turhan: Temporal Query Answering in DL-Lite over Inconsistent Data. In Proceedings of the 16th International Semantic Web Conference (ISWC 2017), LNCS 2017.

*Tutor*: Anni-Yasmin Turhan