Funded projects 2021 Call

The board of the PDI- SSH Foundation honours 10 applications in the Call Digital Infrastructure SSH.

There has been a strong desire in the fields of social sciences and humanities (SSH) to assign part of the resources intended for the SSH Sector Plan of the Ministry of Education, Culture and Science (OCW) to a domain wide digital SSH plan to support accessible, high-quality digital infrastructure facilities. Digital infrastructure plays an increasingly important role and in many cases is the foundation of leading research into the social and scientific issues of the digital society. This not only concerns infrastructure for collecting data, but also infrastructure for storing, publishing, analysing and linking data. In addition, these infrastructures require professional support, for example from data managers and infrastructure specialists. Researchers also have a growing need for digital support and expertise.

To meet those requirements, the Platform Digital Infrastructure SSH launched the second Call for Proposals Digital Infrastructure in the social science and humanities. PDI-SSH launched this call as part of the SSH Sector Plan.

After a careful review process, following with the criteria as described in the Call for Proposals of the Platform Digital Infrastructure SSH, 10 of 47 applications were accepted. The review process consisted of several steps. The disciplinary review committees (consisting of independent reviewers) reviewed the submitted proposals and issued independent advice to the SSH Council. The SSH Council, after consulting representatives of the three largest Dutch infrastructures (CLARIAH, Health-RI and ODISSEI), prioritised these opinions and took the final decision. The Board of the PDI-SSH Foundation implements that decision.

Honoured proposals (in alphabetical order):

Digital Data Donation Infrastructure (D3I)

Applicant: dr. Theo Araujo (University of Amsterdam)

Human behaviour can be studied in an unprecedented level of detail with the digital trace data that users create when using digital platforms. The Digital Data Donation Infrastructure (D3I) will enable individuals to donate their digital trace data to academic research in a secure, transparent, and privacy-protecting manner. It is based on individuals’ rights to download and port their data from any organization which stores user data, as per the General Data Protection Regulation (Article 15). This unlocks a treasure trove for research in the Social Sciences and the Humanities.

D3I turns the user-platform-researcher relationship around: Instead of researchers being dependent on platforms to study individuals, it enables researchers to work directly with individuals to study both individuals’ own behaviour and platforms themselves. This allows researchers to collect crucial data to study causes, contents, and consequences of (online) communication, behaviour and cultural production and consumption within platforms.

Built in a flexible and extensible manner, D3I will initially cover the most popular online platforms, including social media, entertainment and work. Importantly, it will also provide the legal and methodological framework necessary to help accelerate and expand critical SSH research across university and disciplinary boundaries.

Infrastructure for SSH research on Sign Language of the Netherlands

Applicant: dr. Onno Crasborn (Radboud Universiteit Nijmegen)

This project will make significant contributions to the digital infrastructure that is needed for SSH researchers to investigate the sign language of the Netherlands (Nederlandse Gebarentaal, NGT), which is lawfully recognised as of 1 July 2021 as one of the country’s official languages but not yet covered by infrastructure projects like CLARIAH. One of the key problems in studying sign languages is that they lack a writing system. Primary data are in video format and currently need to be transcribed and annotated manually. Creating consistent annotations across datasets requires connecting annotation software (ELAN) to an NGT lexicon that does justice to the morphological, semantic, and phonetic characteristics of the language—words in NGT and Dutch do not match in structure or meaning, making it difficult to build on lexical resources for Dutch. The project therefore focuses on (i) a digital NGT lexicon with semantic and phonetic information, (ii) NGT corpora of adult language use as well as interactions between adults and children, and (iii) software for machine-supported annotation, using novel AI technologies. It connects these sign language resources to existing infrastructures within CLARIAH, enriching the CLARIN-K Centre for Atypical Communication Expertise (ACE) to maintain and bundle tools and resources for NGT.

Building a FAIR Expertise Hub for the social sciences

Applicant: prof. dr. Pearl Dykstra (Erasmus University Rotterdam)

Researchers increasingly rely on data that are FAIR: Findable, Accessible, Interoperable, and Reusable. Social scientists often use data that were not created for research purposes, for example Administrative data, Commercial data, Media Content data, Historical & Archival data, Biometric data, and Geospatial data. Data that were not created for research purposes are typically less FAIR. Many of such data providers are eager to increase their FAIRness, however, they lack the knowledge, skills and incentives to do so.

The proposed project will establish, develop, and maintain a FAIR Expertise Hub to support communities of data providers in improving their FAIRness. An important instrument for the FAIR Expertise Hub will be the FAIR Implementation Profile (FIP), a collection of decisions and plans made by a community about how to achieve FAIRness. A FIP comes with an easy to use wizard and accompanying workshop provided by GO-FAIR.

This project helps data communities in (1) establishing their plans, (2) to agree on their FAIR-enabling resources, and to (3) achieve a substantial increase of FAIRness. The project partners will (4) create alignment with international standards and (5) between communities. Explicit FAIR Implementation Profiles (6) facilitate software developers.

Secure Analysis Environment (SANE)

Applicant: dr. Tom Emery (Erasmus University Rotterdam)

Privacy, copyright, and competition barriers limit the sharing of sensitive data for scientific purposes. We propose the Secure Analysis Environment (SANE): a virtual container in which the researcher can analyse sensitive data, and yet leaves the data provider in complete control. By following the Five Safes principles, SANE will enable researchers to conduct research on data that up until now are hardly available to them.

SANE comes in two variants. Tinker SANE allows the researcher to see, manipulate and play with the data. In Blind SANE, the researcher submits an algorithm without being able to see the data and the data provider approves the algorithm and output.

SANE uses concepts from the CBS Remote Access Environment, ODISSEI Secure Supercomputer and SURF Data Exchange, to build a generic off-the-shelf solution to be used by any sensitive data provider and researcher. SANE can be used by researchers in any discipline, as illustrated by the involvement of consortia in both the social sciences (ODISSEI) as well as humanities (Clariah).

A flexible and sustainable infrastructure for MUSic-related Citizen Science Listening Experiments [MUSCLE]

Applicant: prof. dr. Henkjan Honing (University of Amsterdam)

Data science has had an enormous impact on music research in the last few years, with several international labs basing their scientific insights on large amounts of empirical data. The University of Amsterdam has contributed to this research by showing that engaging games can serve as a powerful method to attract hundreds of thousands reliable participant responses. This unprecedented scale is necessary to characterize musical behavior properly, in particular how it varies among individuals and across societies. In addition to the technical demands of game-like listening experiments, both music and citizen science require infrastructural support that is currently not addressed by any of the existing digital infrastructures (such as CLARIAH). Hence, the aim of the current proposal is to fill this gap by developing a flexible and sustainable infrastructure for music-related citizen science listening experiments, for domains that have special needs with regard to high-quality, platform-independent audio presentation, processing and timing of responses (e.g., music information retrieval, music cognition, computational musicology, phonetics and speech research). The software developed in the project will be made available as an open-source toolkit that both Dutch and international universities can choose to adapt on their own servers.

Diversity and dynamics. The population of Curacao 1839-1950

Applicant: prof. dr. Jan Kok (Radboud University)

This proposal aims to reconstruct the entire population of Curacao between 1839 and 1950, which will provide an invaluable source for the general public interested in genealogy and family history as well as open up new fields of scholarly research in the history of the Caribbean, of colonial societies and of slavery. How was community building affected by the island’s ethnic and social diversity? How did individuals and families respond to the regularly recurring food shortages and the changing economic fortunes? What were the repercussions of slavery in the generations after Emancipation? Curacao offers the opportunity to reconstruct a fascinating Caribbean island population over 5-6 generations. This opportunity is unique, not only because Curacao holds what are probably the most complete sources in the Caribbean, but also because the population can be fully reconstructed at very low cost thanks to the generous access to archival sources and the availability of tools already funded by PDI-SSH and Clariah.

The data will be published in two formats; 1) transcribed datasets of each archival source, useful for genealogists, school projects and qualitative historians, and 2) a database with reconstructed life courses spanning several generations, useful for social scientists and quantitative historians.

WetSuite: Accelerating research on laws, decisions and judgments

Applicant: prof. dr. Anne Meuwese (Leiden University)

We propose to create WetSuite, a web-portal and a collection of tools and resources that will accelerate the application of Natural Language Processing (NLP) methods to Dutch and EU governmental legal data. WetSuite will provide data access and tools both for researchers in law with little or no programming experience, and for researchers in NLP with little or no legal background. It will empower individual researchers, facilitate interdisciplinary collaborations, and serve as a stepping stone for legal scholars and students interested in NLP.

Regarding data access, WetSuite will build on existing government repositories to provide convenient access to Dutch and EU laws/regulations and court judgments, Dutch administrative decisions, as well as related textual data. We take an integrated approach towards the three branches of government – legislator, courts and administration – as they cannot be fully understood in isolation.

Regarding tools, besides data preprocessing and conversion, WetSuite will offer an interface to powerful NLP libraries, attuned to the different data types. Common NLP methods will be made available as web services in the browser, with no installation or programming required. The underlying WetSuite Python library will be open source and easily extensible for users with more technical know-how.

HIP-NL: an Historical Income Panel for the Netherlands

Applicant: dr. Auke Rijpma (Utrecht University)

This project will create a unique large-scale historical income panel. Using tax records, thus far rarely used in a systematic manner, income data at multiple points in time will be gathered for 200,000 individuals in the Netherlands between 1851 and 1922. Recent mass digitisation of vital records and population registers, as well as the development of data integration tools means we will be able to connect the income panel to many other important datasets in the SSH domain. This integrated collection of datasets will enable new lines of state-of-the-art research into social and economic inequality across multiple dimensions, over a long time frame, and over multiple generations.

Oral History – Stories at the Museum around Artworks’ (OH-SMArt)

Applicant: dr. Sanneke Stigter (University of Amsterdam)

Oral History – Stories at the Museum around Artworks’ (OH-SMArt) is a long term initiative to significantly improve the digital research chain around using Oral History and spoken narratives, with research into artworks and museums as a use case. Holding unique audiovisual recordings about artworks in their archives, museums have a severe backlog in disclosing and sharing this information, because of the laborious workflow of storing and transcribing, the sensitivity of some of the material, and the lack of tools to use and reflect upon the content. These are generic problems for all researchers engaging with spoken narratives. An improved and user friendly deposition workflow that automatically connects to an automatic speech transcription service will resolve a significant part of this problem. Additionally, the improved workflow enables the development of new tools that especially aim at facilitating reflection by contextualizing the source material with layers of user interpretations, placing the researcher’s viewpoint into perspective. Opening up the behind the scenes of museums in a smart way, OH-SMArt advances research with spoken narratives around artworks and contributes to existing digital research infrastructures with domain-wide applications for knowledge development.

LAWNOTATION

Applicant: prof. dr. Gijs van Dijck (Maastricht University)

LAWNOTATION is an initiative of the Digital Legal Studies cluster in the Sectorplan Social Sciences and Humanities (SSH) – Rechtsgeleerdheid and other Dutch universities that are collaboratively working on questions related to the digitalisation of law. The legal research community lacks the availability of data, tools, and platforms that allow for the computational analysis of legal data. This project aims to develop an infrastructure that enables SSH researchers to systematically analyze legal documents such as legislation and court decisions.
The proposed infrastructure will offer the following functionalities:

Access to and sharing of data – making legal data and annotation schemes (current and future) accessible for annotation and analysis purposes.
Annotation platform – developing and offering annotation software and schemas in order to analyze the linguistic and legal characteristics of legal documents.
Interface – access to data, the annotation schemes, and the annotation software will be offered through a user-friendly interface.

A team of developers will work closely together with SSH researchers on the improved access to legal materials, which will benefit SSH researchers as well as society as a whole. The infrastructure will be embedded within CLARIAH-WP3 (Linguistics) and CLARIAH-WP6 (Text).