Gehonoreerde projecten 2020 Call

Het bestuur van de Stichting PDI- SSH honoreert 12 aanvragen voor de Call Digitale Infrastructuur SSH.

Alle SSH-sectoren hebben te maken met de grote uitdagingen van big data, kunstmatige intelligentie (AI) en sociale media. Die uitdagingen betreffen niet alleen maatschappelijke vraagstukken maar ook de wetenschapsbeoefening zelf: hoe kunnen de SSH op een verantwoorde manier inspelen op deze nieuwe ontwikkelingen op het gebied van digitalisering? Om deze vragen te kunnen beantwoorden heeft de Stichting PDI-SSH de Call for Proposals Digitale Infrastructuur in de sociale en geesteswetenschappen uitgezet. De Call is een onderdeel van het Sectorplan SSH.

Na een zorgvuldig reviewproces, conform de criteria zoals beschreven in de Call for Proposals van het Platform Digitale Infrastructuur SSH, zijn 12 van 33 aanvragen gehonoreerd. Het reviewproces bestond uit meerdere stappen. De disciplinaire beoordelingscommissies (bestaande uit onafhankelijke reviewers) hebben de ingediende voorstellen beoordeeld en daarover onafhankelijk advies uitgebracht aan het SSH-beraad. Het SSH-beraad heeft deze adviezen, na consultatie van vertegenwoordigers van de drie grootste Nederlandse infrastructuren (CLARIAH, Health-RI en ODISSEI), geprioriteerd en het eindbesluit genomen. Het Bestuur van de Stichting PDI-SSH voert dat besluit uit.

Voor meer informatie, neem contact op met het secretariaat van PDI-SSH via info@pdi-ssh.nl.

Gehonoreerde voorstellen (in alfabetische volgorde):

Gauging Past Performance. Creating Financial Metadata for Economics, Finance, and Economic History in the Netherlands.

Applicant: prof. dr. Herman de Jong (University of Groningen)

Finance teaching and research in the Netherlands relies entirely on US data, easily available but not very relevant for the country. To remedy this, we build NEDHISFIRM, a comprehensive information system holding Dutch corporate and stock exchange data for the period 1796-1980. Historians and economists collaborate closely to provide new and vital data for understanding the long-term evolution of Dutch financial markets, firms and public finance in their societal context. Modelled closely on proven systems in Belgium and France, NEDHISFIRM supplies innovative standards of documentation bringing together metadata formats and an architecture facilitating the next generation of data-extraction and enrichment systems from historical sources.

We need this system because comprehensive Dutch data add significantly to current European efforts to provide a necessary corrective to dominant, US-derived, assumptions about modern finance and financial market evolution. Moreover, systems connecting long-term corporate and financial data facilitating in-depth analyses of a country’s financial system have been or are being made everywhere, so Dutch universities need urgently to catch up. NEDHISFIRM fills the rapidly growing demand for digital corporate information from economists and economic historians and thus tallies with a sister project (main applicant Dr. Wolter Hassink) providing such information for the period from 2021.

MacBERTh – Bidirectional EncoderRepresentations from Transformers of historical English and Dutch

Applicant: dr. Lauren Fonteyn (Leiden University)

The aim of this infrastructure project is to create the first set of deep neural language modelspre-trained on historical textual material (Dutch and English) from different time periods. This semantic encoding infrastructure, or ‘MacBERTh’ , will serve as an invaluable SSH research tool that enables new ways of analysing historical text: by making the underlying meaning of words, phrases and abstract sentence patterns accessible, searchable, and analysable in a bottom-up, data-driven way, the offered infrastructure will allow researchers from the present to uncover and draw connections between concepts and ideas from the past.

The technology underlying this infrastructure is based on a crucial insight from distributional semantics, which states that the linguistic context in which words and phrases appear provides a good approximation of their meaning. Based on this idea, a number of powerful computational models have been developed to create detailed, compressed linguistic representations when trained on large bodies of text. These representations have already proven to be crucial in addressing various challenges in computational linguistics (NLP/NLU) and related disciplines. Yet, these models have not yet been exploited to study meaning representation in historical language. This gap will be filled by ‘MacBERTh’.

FIRMBACKBONE: digital data collection infrastructure on Dutch companies

Applicant: prof. dr. Wolter Hassink (Utrecht University)

The digitalization of the economy has enabled novel methods of data collection of companies. Such digital economic data are both detailed (wide) and broad in scope (long). However, applied researchers in economics and adjacent fields currently lack access to a comprehensive and longitudinal data source about Dutch companies. The purpose of this project is to build an organically growing longitudinal data-infrastructure of information on Dutch companies. A backbone will be created that contains a list of company names that are publicly accessible and that can be used for web scraping. We will enrich the data by public information of annual reports and other data, such as financial reports. Researchers making use of the backbone will enrich it with data they have collected, for example using surveys or tailor-made indicators. Quality control will be performed in collaboration with CBS.

The data will be accessible to academic researchers under an access protocol within the secure high-performance ODISSEI Data Facility hosted at Surf. Collaboration with the existing research-infrastructure ODISSEI will ensure adequate data protection, governance, and long- term sustainability. Collaboration with ODISSEI, a social science infrastructure that currently lacks economic information, opens up new avenues for cross-disciplinary research linking individual-level (micro)data with firm-level information.

Population Scale Network Analysis for Social Sciences and Humanities (POPNET-SSH) -A Digital Infrastructure and Research Community

Applicant: dr. Eelke M. Heemskerk (University of Amsterdam)

This proposal embarks on an ambitious yet highly necessary and timely endeavour: the setting up of a novel open digital infrastructure and research community that unlocks longitudinal social network data on the entire population of the Netherlands. POPNET-SSH enables new exciting research on unique data consisting of demographic information and time-stamped family, work, school and neighborhood network relations in an anonymized as well as ethically and legally responsible manner.

The outcome is a research community centered around a first-of-a-kind research infrastructure tailored in terms of hard- and software for large-scale social network analysis. It for the first time allows scholars in the SSH domain (and beyond) without specialised technical computing skills to derive population-scale network statistics.

Rich methods from social network analysis and network science can then be applied to obtain unique insights that may unveil new and previously unknown knowledge about the complexity of the Dutch population. From this, scholars and policymakers alike can derive actionable insights into key issues in the SSH domain, including (but not limited to), stratification, segregation, substantive social change, and UN sustainable development goals such as reducing inequality.

Connecting Data in Child Development (CD2)

Applicant: prof. dr. Chantal Kemner (Utrecht University)

The NWO gravitation funded Consortium on Individual Development (CID) unites Dutch developmental research in social sciences and humanities. As part of CID, six longitudinal youth cohorts currently collaborate (YOUth, L-CID, RADAR, NTR, TRAILS, and Generation R) across studies and disciplines. Over the past fifty years, these studies have collected a wealth of data on social, psychological, and biomedical developments during childhood, adolescence and beyond. To use these multidisciplinary data to their full potential and enable cross-study integration, it is crucial to highlight the connections and commonalities between these cohorts. To do this, the information about the data (metadata) needs to be harmonised and easily findable.

The current initiative aims to develop a digital infrastructure that encompasses harmonized metadata of all CID cohorts. This digital infrastructure will enable users to locate and explore study variables and materials, and thereby identify opportunities for integration, replication, and collaboration. CD2 will develop this infrastructure according to the excellent standards employed by ODISSEI and HEALTH-RI, complementing these initiatives. The infrastructure will be built in such a way that the approaches and pipelines will be freely available, and adaptable to other longitudinal youth cohorts.

Legacies of bondage: towards a database of Surinamese life courses in a multigenerational perspective (1830-1950)

Applicant: prof. dr. Jan Kok (Radboud University)

This proposal aims to construct a consolidated database of the population of Suriname between 1830 and 1950. The rich archival sources of Suriname offer the opportunity to combine records on the individual lives of enslaved, bonded labourers and free inhabitants over five or six generations. This makes it possible to study social processes and diversity in a colonial society as well as the repercussions of slavery over multiple generations. This makes this project truly unique.

This digital infrastructure facility will have a major impact on research, because it can be used to answer a wide range of questions from scholars in the humanities, social sciences, and life sciences. To make the database accessible for different types of questions we will publish the data in two different formats; 1) transcribed datasets of each individual archival source, which will be useful for genealogists, school projects and qualitative historians, and 2) a database with reconstructed life courses including links to family members, spanning several generations. This project starts from a proven concept, developed for the Surinamese slave registers database which was published in 2018/2019, and will include citizen science.

TwiXL: An infrastructure for cross-media research on public debates

Applicant: prof. dr. Julia Noordegraaf (University of Amsterdam)

TwiXL develops an infrastructure that enables SSH researchers to systematically examine current and emerging public debates on crucial societal issues in the Netherlands. The proposed infrastructure will be developed along the following three axes:

Deep – continuing and making accessible the TwiNL collection, containing 50% of all Dutch language tweets (2011-), allowing for a systematic exploration of the Dutch Twitter sphere on any societal topic.
Broad – curating and making accessible Dutch language collections of social media and web data, as well as newspaper reports, radio and television broadcasts on prominent societal issues (2020-2025), enabling innovative cross-media research.
Live – facilitating real-time streaming data processing and analysis of Twitter-data, allowing for live monitoring of online public discourse.

Access to all three collections will be provided through a user-friendly web interface and Jupyter Notebooks for more advanced analyses. To develop the new infrastructure and demonstrate its value for research, a team of developers at SURF, KB, and NISV and two postdocs—at UvA and RUG—will closely work together with SSH researchers in proof-of-concept research projects. The infrastructure will be embedded in the CLARIAH Media Suite and the planned ODISSEI Media Content Analysis Laboratory.

PURE3D. An Infrastructure for the Publication and Preservation of 3D Scholarship

Applicant: dr. Costas Papadopoulos (Maastricht University)

Three-dimensional models and reconstructions have been used in the last thirty years across many fields in the humanities and social sciences to bridge time and space; to become immersed in the past through virtual worlds; to explore physical artefacts from multiple angles; to allow interactive close-ups and see features not visible with the naked eye; and to analyse sociocultural phenomena and simulate the experience and perception of objects and spaces. Despite this plethora of research, 3D digitisation initiatives by cultural institutions, and a growing number of higher education institutions teaching 3D skills, methods, and theories, no stable infrastructure exists to support this form of knowledge production. PURE3D will fill this gap through four key deliverables: 1) the development of an access infrastructure for viewing interactive 3D models (from single objects to virtual worlds) within the context of a scholarly publication format (3D Scholarly Editions); 2) a preservation repository to deposit raw files, which, due to their size, format, lack of standards etc., are typically inaccessible to researchers beyond the original creators; 3) a conceptual and methodological framework for valorising and evaluating 3D scholarship; and, 4) a centre of excellence for researchers embarking on 3D scholarship.

A digital catalogue of digital literature

Applicant: dr. David Peeters (Tilburg University)

This project will develop and launch an online catalogue of digital literature. The current absence of an adequate digital catalogue of digital literature (e.g., literary texts presented in virtual reality, interactive poems experienced via smartphones) hampers scientific research in the humanities and the social sciences, contributes to the existing mismatch between the current content of secondary school education and the interests of present-day pupils, and limits public libraries in what products they can offer. We hence need an overview of currently available expressions of digital literature. This overview needs to be publicly available as a digital catalogue. Our methodology combines the documentation of currently available products of digital literature with the development of the digital catalogue. We will find and categorize available products of digital literature created in the Netherlands between 2000 and 2021, and turn this collection into a digital catalogue that meets the FAIR (findable,accessible, interoperable, and reusable) principle. This novel digital resource, shared long- term via CLARIAH, will be fundamental for interdisciplinary SSH-research into contemporary literary culture. It will be pivotal in matching educational content with pupils’ interests. It will allow libraries to offer their public a more contemporary collection that includes state-of-the- art expressions of digital literature.

Capture and Analysis Tools for Social Media Research (CAT4SMR)

Applicant: dr. Bernhard Rieder (University of Amsterdam)

The project seeks to stabilize and further develop a set of existing and heavily used tools for the collection and analysis of social media data (Facebook, Twitter, YouTube, Reddit, 4chan). Developed within the framework of the UvA’s Digital Methods Initiative, our tools – Netvizz, DMI-TCAT, YouTube Data Tools, and 4CAT – have been mainstays of the Dutch and international research landscape for years, allowing researchers to make sense of these increasingly dominant online platforms and the cultural practices they host. Due to continuous changes in data access (e.g. APIs), legal context (e.g. GDPR), data formats, and terms of service (TOS), researchers’ access to social media platforms has been rendered more difficult and the mission our tools strive to fulfill – easy but robust access to platform data and analysis for researchers in the humanities and social sciences – has become more challenging. Providing research infrastructures, in this context, is much more than building tools. We therefore seek funding not only for sustainable technical development, support, and maintenance, but for the increasingly difficult work of negotiating access conditions with platform owners, for documentation and teaching resources, for testing the reliability and reproducibility of results, and for the continuous furthering of methodological innovation.

Dutch participant recruitment platform for SSH research

Applicant: dr. Martin Tanis (Vrije Universiteit Amsterdam)

Much research within the Social Sciences and Humanities relies on input that can only be obtained from people. Such research includes experiments or surveys, development of tests and measurement instruments, and (digital) humanities studies requiring human annotations of textual data, artefacts, and images. Although international online platforms exist to recruit human participants (e.g., MTurk, Prolific, Figure8), these are unusable for research bound to the Dutch linguistic and/or cultural context. Cases in point are, e.g., the development of Dutch verbal memory tests, annotation of Dutch texts for machine learning, classification of Dutch cultural artefacts or art, or testing Dutch health messages’ effectiveness. Moreover, GDPR regulations prohibit Dutch researchers from storing data outside EU borders. Finally, criticism regarding (especially) MTurk’s data quality and ethics signals a need for homegrown alternatives.

The current proposal addresses these challenges by developing an affordable, sustainable, and secure online participant/annotator recruitment platform for Netherlands-based academic researchers. Development of the platform will help Dutch SSH scholars to remain internationally competitive, while serving Dutch societal research needs. It will also strengthen ties between Dutch Social Sciences and Humanities research communities. The platform will be developed in cooperation with not-for-profit platform developer EYRA together with established Dutch academic partner SURF.

Homo Medicinalis. Recognising the voices of patients to retrace the opinions about medicines (HoMed)

Applicant: dr. Henk van den Heuvel (Radboud University)

HoMed will implement a SSH research infrastructure with an enormous potential for automatic transcription of sensitive audio-visual (AV) recordings. Its use case will focus on AV-recordings of medical consultations on the use of pharmaceuticals (henceforth ‘MedPharm’). MedPharm in practice shows situations where patients often appear not to be able to understand proper medicine use. To overcome unintentional medicine use we need to better understand the attribution of meaning to medicines. Therefore, recordings and transcriptions of patient consultations are needed.

In HoMed a standard automatic speech recogniser (ASR) for Dutch will be adapted to MedPharm discourses, since essential jargon is not part of the vocabulary of the current generic ASR. The ASR will be retrained on existing radio and tv data and on highly sensitive AV-recordings of patient consultations at Nivel. The resulting infrastructure will be made available (i) as component of the ASR-service in CLARIAH’s Infrastructure (accessible via the Media Suite) and (ii) via Stichting OpenSpraaktechnologie to be used as a service tool in similar projects. In addition, a standalone version of the infrastructure developed for Nivel can be employed for sensitive data analysis under intramural conditions. The application will fully comply with the GDPR.