Past and Future of ISO21127:2006 or CIDOC CRM

The CIDOC Conceptual Reference Model (CRM) is a formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. Its goal is to provide the semantic definitions and clarifications needed to transform disparate, localised information sources into a coherent global resource, be it within an institution, an intranet or on the Internet. In order to achieve this, the CRM adopts a supra-institutional perspective, abstracted from any particular local context. This abstraction is derived from the underlying semantics of the database schemata and document structures found in museum and cultural heritage documentation. The CRM is descriptive rather than prescriptive: it explains the logic of what cultural heritage institutions do in fact document rather than telling them what they should document. We believe that this is the key to enabling semantic interoperability.

Work on the CRM began in 1996 when ICS-FORTH invited the CIDOC Documentation Standards Working Group, co-chaired by Pat Reed and Nick Crofts, to a workshop in Heraklion, Crete to discuss how a common data standard could achieve interoperability of museum data while still taking into account its extraordinary diversity and specialization. This workshop was the beginning of 10 years development work by an interdisciplinary team of experts, coming from fields such as computer science, archaeology, museum curation, art history, natural history, library science, physics and philosophy. The approach was “bottom-up”, aimed at reverse-engineering and integrating the common meaning of more and more database schemata and documentation structures from all museum disciplines, archives and, more recently, libraries.

The very first schema analysed was the CIDOC Relational Data Model. This complex model, which contains more than 400 tables (Reed 1995), was successfully condensed to an ontology of about 50 classes and 60 properties, with far wider applicability than the original schema. Over the following years, other schemata were analysed in order to cover the documentation practice of all museum disciplines. The CRM now contains 80 classes and 130 properties, and covers the semantic field of hundreds of schemata.

In contrast to commonly held beliefs, we found a surprisingly stable set of core concepts shared across many disciplines and institutional categories. This common ground is concealed beneath a confusing diversity of expert terminology. Habit, lack of time and patience and the paucity of interdisciplinary collaboration seem to be at the root of the belief that there are wide differences of conceptualisation, rather than any real absence of common concepts.

The concepts and relationships of which the CRM ontology is composed demonstrate a surprising level of stability: the more schemata we analysed, the fewer changes or additions were needed to describe new cases and applications. This experience convinced the working group as a meeting in London, in 1999, to start the standardisation process in collaboration with ISO. Nick Crofts became the convenor of the International Standards Organisation (ISO) working group ISO/TC46/SC4/WG9. The CIDOC CRM became an international standard, ISO 21127:2006, in September of 2006.

In Ottawa, 2000, CIDOC founded the CIDOC CRM Special Interest Group (SIG) chaired by Martin Doerr. The purpose of this group is to cope with the specific demands of the standardisation process and to foster wide support for the model beyond ICOM. From the perspective of ISO, the CRM-SIG acts as a group of domain experts and will continue to maintain ISO21127:2006 together with ISO/TC46/SC4/WG9.

On first reading, the CRM looks very similar to an object-oriented database schema. However, as a formal ontology, it represents a higher level of abstraction: a simplified representation of how experts and laymen perceive reality, specifically the reality of cultural heritage, in terms of categories (classes) and relationships (properties). It is a common ground of understanding rather than an arbitrary convention and, as such, it is extensible and unlimited. This gives it its particular power to integrate information from disparate fields.

Using the CRM requires a little work. Existing schemata need to be re-expressed, or “mapped”, in terms of the classes and properties contained in the ontology. New schemata may also be created from scratch, using the relevant elements. Once this is done, the semantic obstacles to interoperability that arise from divergent forms of representation are removed and it becomes possible, using suitable tools, to merge, combine and query contents from disparate and otherwise incompatible sources.

This may seem like a lot of hard work. However, other approaches to interoperability cannot provide anywhere near the level of depth and expressive power that are required in the field of cultural heritage – in our view there really is no viable alternative to a common ontology. Experts in other institutions and R&D projects seem to agree – take-up of the CRM is rapidly increasing. The CRM SIG is now working on two fronts: fostering application know-how and collaboration with tool providers on the one hand, while widening the scope of the ontology on the other.

We believe that museum information can best be seen in the wider context of memory institutions in general. Therefore we have been working on harmonisation and integration of the CRM with relevant models from related disciplines. In 2001, a harmonisation study of the ABC Harmony model was carried out in collaboration with Carl Lagoze and Jane Hunter. ABC Harmony was developed independently by the Digital Libraries and Multimedia communities and was seen as potentially in competition with the CRM. The harmonisation study enriched the CRM with interesting abstractions of material and immaterial things and demonstrated the existence of a common underlying conceptualisation.

In 2003, we started a very fruitful collaboration with the Functional Requirements for Bibliographic Records (FRBR) Review Group, chaired by Patrick Le Boeuf, aimed at harmonising the FRBR model with the CRM. FRBR and FRAD, which are maintained by the International Federation of Library Associations (IFLA), are important models of library concepts. A reformulation of FRBR and FRAD, as an ontology named “FRBRoo”, should be completed in 2007. This work demonstrated that, with some minor but useful changes, the CRM is powerful enough to incorporate FRBR concepts as a specialisation of the CRM. FRBRoo presents a conceptualisation of intellectual production, including performing arts, which is equally interesting to museums and libraries. FRBRoo is an official work item of IFLA and we expect its acceptance by IFLA in due course.

Our collaboration with IFLA could be a good model for future development of the CRM. Together, the CRM and FRBRoo form a coherent ontology. Nevertheless, there are two distinct communities responsible for the maintenance of both parts. These communities ensure that the social and intellectual contents of these models conform to their conceptualisation and take into account the long-term needs of the domain experts. Both communities are committed to resolving possible inconsistencies between their models on an open and equal basis.

It is our hope that this fruitful pattern of interdisciplinary negotiation can be repeated. In order for discussion to be possible, we have to convince the experts that a) there is indeed a need to share information across fields and disciplines and b) that sharing is possible since, despite appearances, our fundamental concepts are in fact the same.

In this vein, a similar collaboration has just begun with the Text Encoding Initiative (TEI) community. It is our hope that initial contacts with representatives of the archival community will finally lead to an intellectual model capable of integrating Archives, Libraries and Museums. We regard the prospect of a common, rich and expressive ontology for the integration of metadata across all fields of cultural and scientific heritage as both attractive and feasible.

Finally, we wish to thank the many unnamed contributors to the CRM for their enthusiasm and hard work and we hope to continue receiving their support in the future.

Martin Doerr, 2007-03-22