Understanding Data Models
In the world of data, one of the key challenges is the diverse ways data can be described. This presents a challenge for interpreting and reusing data, particularly across domains, since the decided-upon structure hinges on domain needs. Accessing data requires knowing its structure, which if different every time, requires learning it every time.
The AI4PublicPolicy project elaborates on a solution to create a catalogue of Data Models that will support the search and semantic analysis between the Semantic Concepts of different models.
Data models are a visual representation of either a whole information system or a part of the whole, by using standardized schemas. These models serve to illustrate types of data, the relationships between data, how it is structured, in what format, and what are its attributes and properties. In essence, a data model looks to make data meaningful and data communication possible based on three concepts:
- A set of data structures outlining operations and constraints. They encompass the description of data types, properties, and descriptions.
- A set of operators and inference rules describing the methods and types that the data model employs.
- A set of constraints describing syntax, dependencies, and constraints that ensure the model’s accuracy, validity, and compatibility.
Furthermore, a data model consists of three levels, each serving a unique purpose:
- A conceptual level for communicating ideas to users about the domain and defining entities and relationships.
- A logical level, also referred to as a schema, for communicating ideas to designers about domain applications.
- A physical level which refers to the actual physical storage and implementation in a database management system.
One of the key aspects of data models is their adaptability. Models are typically designed to cater to specific needs, and as requirements change over time, the data model must evolve accordingly. To facilitate this flexibility, data models maintain a certain level of abstraction.
This abstraction also makes it possible to reuse or adapt existing schemas to other contexts which is one of the benefits of their use. The other one is that, by using these standardized schemas, interoperability of data can be achieved, because everyone is describing the same thing in the same way. Otherwise, it may become difficult, or outright impossible, to make sense of or extract information from data where it is not known how it is structured.