What is the BDV Reference Model and how is it related to AI4PublicPolicy?

The Big Data Value – BDV was defined by the Big Data Value Association – BDVA and is a framework to locate AI4PublicPolicy data-related technologies. The reference model is structured into different horizontal and vertical concerns as depicted in graphical representation below. The horizontal concerns represent core components of the data value chain, while vertical concerns represent cross-cutting issues that may affect all the horizontal concerns.

The data visualization and user interaction layer deals with the tools for representing big data and the interaction between users and the visualization of the big data.

Data analytics deals with the techniques for understanding and extracting knowledge from data. These include machine learning, deep learning, natural language processing, event and pattern discovery, deep learning technologies, and high-performance data analytics that take advantage of high performance computing.

Data processing architectures must deal with data at rest (stored data) and data in motion (streams of data) coming from heterogeneous resources in different formats (structured, non-structured). The data processing architectures must be designed so that they can take advantage of high-performance computing and cloud computing in order to scale to the increasing amount of data being produced.

Data protection deals with the techniques to protect person-specific and sensitive data. Privacy protection includes protecting analytics applications and the (cloud) underlying infrastructure from data leakages. These mechanisms must ensure anonymity and protection against reversibility.

Data management deals with the techniques for dealing with large amounts of data that are being produced. Multilingualism of data sources, especially relevant in EU, makes difficult to analyse the data since they are in different languages. The lack of a common representation of the data creates silos of data that cannot be processed without a semantic interoperability layer.

Cloud and High-Performance Computing (HPC) are basic pillars for effective data processing and management. The use of cloud for data processing is a standard nowadays. All cloud platforms provide different infrastructures for storing and analyzing data. The use of HPC resources has been recognized as a complementary approach for extreme data analytics. The European Open Science Cloud (EOSC) is a federated system based on a set of existing research infrastructures which delivers a catalogue of services, software and data from major research infrastructures.

Among the vertical aspects it is worth mentioning the market places for sharing data and facilitating the usage of horizontal aspects.

AI4PublicPolicy will collect, process and analyze large volumes of datasets from different data sources (e.g., citizens, public authorities databases, social media) using AI technologies and cloud computing resources.

In the context of the above the AI4PublicPolicy addresses the following concerns of the BDV Reference Model in the following ways:

  • Data Analytics.
    AI4PublicPolicy will use AI tools for policy modelling, extraction, simulation and recommendation. The AI tools to be used in AI4PublicPolicy include machine learning to extract policy related knowledge from large datasets, opinion mining and sentiment analysis based on the opinions of citizens expressed in social media.
  • Data Protection
    AI4PublicPolicy will use anonymized data.
  • Data Processing Architectures
    AI4PublicPolicy will deal with data-at-rest or stored data coming from policy authorities and the interaction of citizens with the administration and the services they provide. The AI4PulblicPolicy architecture will also deal with streaming data coming from citizens’ opinions on social media. So, the architecture must integrate both kinds of data management and analytics tools.
  • Data management
    AI4PublicPolicy will deal with structured data (data in tables) and unstructured data (opinions in natural language). AI4PublicPolicy will use natural language processing tools for avoiding the creation of policy data silos in different languages. AI4PublicPolicy development is guided by five pilots from different European countries with different languages. The project will integrate tools for semantic interoperability of policy data sources.
  • Cloud and High-Performance Computing
    AI4PublicPolicy will be integrated with the EOSC portal to facilitate access to cloud and HPC resources.