AI4PublicPolicy Components pt. 1

The main components of AI4PublicPolicy platform are logically grouped into three main modules: Reusable and Interoperable Policies, Transparent and Trusted AI for Policy Management and AI Tools for Autonomous Policy Making. All these components are independent components that provide REST interfaces. The architecture notation is derived from the UML notation using lollipop notation for the interfaces. The half circle indicates the user of an interface provided by the ball.

The target users of the AI4PublicPolicy platform are policy makers and data scientists. The entry point of AI4PublicPolicy for both types of users is the Virtualised Policy Management Environment (VPME), which is a web application that provides the user interface (UI) and acts as front-end for all the functionalities of the AI4PublicPolicy platform. Depending on the type of user (policy maker or data scientist), the VPME presents different functionalities; for instance, the data scientist may define analytical workflows while the policy maker which is not aware of the different ML techniques may use autoML features.

There are external data sources such as social media (e.g., Twitter interface) or other forums from which data for opinion mining and sentiment analysis is collected.

The sections that follow provide an overview of the AI4PublicPolicy architecture parts. Each component’s functionality is described detailing the subcomponents, its inputs and outputs:


Policy and Datasets Management

Component Description

The goal of this component is to provide the software tools needed to collect and manage the datasets that will produce the pilots in the project and also to define and manage policies. The software should be generic enough so that it can be used with different datasets and data formats. Another requirement of the component is that it should be able to collect data-at-rest or data-at-movement. The collection of datasets is done through a web service (REST API) that datasets providers will connect to for providing information on the location of the dataset to be loaded into AI4PublicPolicy platform.

The image below graphically presents the subcomponents of the Data Collection and Management component. The Data Retrieval subcomponent connects to that endpoint, and then, it downloads the data and invoke the Data Injector which stores the data in a Data Store. The main component in the data collection process (the data injector) will be implemented using Apache Beam , an open source framework for defining data pipelines. Beam can be used for the Extract, Transform, and Load (ETL) process. These tasks are useful for moving data between different storage media and data sources, transforming data into a more desirable format, or loading data onto a new system.

Policy and Data Management component entails two sub-components, namely the Policy Management and the Data Management.


Data set format, endpoint for the input dataset, datastore endpoint.


Data, Policy or AI model in a AI4PublicPolicy datastore.


Semantic Interoperability

Component Description

The goal of the Semantic Interoperability toolkit is to dynamically enable interoperability. It aids datasets in becoming interoperable without user changes, by acting as an interoperability mediator. Realising interoperability is a key challenge due to the existence of multiple ontologies and archetypes for policy making, that need to semantically interoperate.

The semantic interoperability toolkit is based on the Plug’n’Interoperate solution that is made possible by the existence of ‘interoperability drivers’ which define translations between data formats. In the Plug’n’Interoperate environment, systems simply plug (into the interoperability support system) and promptly interoperate with other systems present in the data-sharing environment. Like PnI, the semantic interoperability toolkit provides interoperability enabling methods, in order to ensure interoperability between Public Policy datasets.

To achieve this goal, the semantic interoperability toolkit relies on a GraphQL-based technologic solution to support the definition of queries. Each Public Policy dataset requires its own Schema describing the dataset in a GraphQL aligned schema. The schemas with the description of each Public Policy dataset must be provided to the Semantic Interoperability toolkit engine in order for their data to be reachable by the semantic queries. The result from the queries, as usual in GraphQL queries, are presented in a JSON-LD format, where each data field is associated with the URI to the concept that describes it.


As inputs the Semantic Interoperability toolkit supports two types of inputs: user inputs and system inputs. User inputs correspond to inputs given by users to operate the semantic toolkit. Those inputs correspond to semantic query description and identification of the desired target semantic model.

System inputs correspond to the inputs and configuration needed for the semantic toolkit to operate. This corresponds to schemas, and resolvers used onboard Public Policies onto the Semantic Interoperability toolkit engine, as well as exporters required to convert the JSON-LD with the query results into target data model.


As output the Semantic Interoperability toolkit provides the results from the semantic queries, either in a JSON-LD format or other format specified for the output.