DokBau
Conserving Knowledge from Heterogeneous Project Documentations
Conserving Knowledge from Text Documents
The long term use of construction experience and the therefore necessary documentation of knowledge is still an unresolved problem. The knowledge remains dispersed in a multitude of fragmented information, that are only to a limited state stored in well structured documents and notes. It can be assumed, that in the near future these diverse information will be present in electronic form, but still most of it will not be stored in stadardized documents, e.g. DTD in SGML. This is the starting point of our this research approach.
The project combines methods from the area of the semantic text analysis, Bayesian networks and the application of knowledge represented product models to structure project information. The goal is to get probabilistic descriptions of the implicit knowledge-structures imbedded in project corpora to generate knowledge maps of the documents and their content.
Document Analysis
The analysis of documents can be differentiated by the degree to which the documents are structured. We distinguish between standardised documents i.e. based on SGML and not explicitly structured documents.
Our research approach focuses on the evaluation of the later freeform documents. It was motivated by a research effort by Dong and Agogino (Dong and Agogino, 1997) on learning ontologies from documentation. It is a two-phase approach. In a first step the single terms of the documents are indexed and their semantic relations are analysed. In a second step the documents are clustered in regard to the semantic analysis. These clusters now allow for further evaluation of the relations and interdependencies between the documents as well as for making hypotheses about their content. By clustering the documents a kind of hierarchical structure is generated, that can be interpreted as a product/process model extracted (learned) from the documentation. It is now possible to link the information from the freeform documentation with highly structured information kept in product databases. Thus, this approach can be seen as an attempt to mediate between the analysis of purely unstructured and highly standardized documents.
Context Knowledge captured in Product and Product Data Models
Following the text analysis we suggest to concretise the "learned product model." Depending on the analysed corpora, the learned product will remain fuzzy. Our approach is to apply further context knowledge to strengthen the relations between the documents, to improved the quality of the generated hierarchical information structure.
Good sources of background- or context knowledge are the project-invariant knowledge formalized in product data models and the project-specific knowledge of the instantiated objects and relations of actual product models. Unfortunately, while product and product data models (here simply referred to as product models) for the design are close to being accepted in practice, in the foreseeable future there will no product models be available for the construction phases. Thus this research will concentrated on the use of these product models.
Lacking of several aspects concerning the production and the time dimension of construction, the diffuseness of the derived knowledge model can only be reduced, but it will not be possible to obtain a consistent information structure. The goal is to model the remaining fuzziness in detail, so that it can be presented to the user and be explicitly used for further analysis.
Modelling Uncertainties
Apart from the classical probability theory and the fuzzy theory in mathematics, there are several possibilities to consider uncertainties within the research domain of Artificial Intelligence, like certainty factors, dempster-shafer-theory and belief networks near the classic probability theory and the Fuzzy-theory in mathematics.
A promising approach for this research approach is the theory of belief networks [Olive and Smith 1990; Russel and Norvig 1995]. A belief network is a data structure which represents dependences between different variables. Every knot in the directed graph defining the network has a table of conditional probabilities assigned to it. This table quantifies the influence/dependency of presorted knots on the current knot. Thus belief networks allow for assigning explicit values to the semantic connections between terns and documents as wells as for further analysis of represented knowledge structure.
References
Dong A., Agogino A. M. (1997): Text Analysis for Constucting Design Representations. In: Artificial Intelligence in Engineering, Band 11 (2)
Oliver R. M., Smith J. Q. (Hrsg.) (1990): Influence Diagrams, Beliefe Nets and Decision Analysis. Wiley
Russel, S., Norvig P. (1995): Artificial Intelligence - A Modern Approach. Prentice Hall, New Jersey