SSHOC Workshop Notes: ParlaMint – exploring societal issues through comparable corpora of parliamentary debates

Date:

04 August 2021

Parliamentary debate transcripts hold a lot of information regarding the dynamics inside the parliament. Legislation is debated on parliamentary benches, resulting in rich discussions on various societal events and developments. By connecting these transcripts with political metadata, such as, for example, party affiliation, researchers are able to analyse how the members of parliament react to discussions pertaining to different kinds of events.

A recent event which had a massive impact across the world is the COVID-19 epidemic. In this research project which was part of the SSHOC workshop during the Helsinki Digital Humanities Hackathon, an interdisciplinary group of six researchers led by Ajda Pretnar and Matej Klemen used transcripts from four parliaments (Italian, Polish, Slovene, and British) to analyse the differences between debates before and during the epidemic. The ParlaMint data set (Erjavec et al. 2021) was used as a data source since it contains parliamentary transcripts, metadata and linguistic annotations of the transcripts for both periods. National corpora were sampled into two subcorpora, the COVID-19 subcorpus and the Reference subcorpus. Both contained only the speeches from regular MPs, thus excluding the parliamentary speakers (chairpersons) and guests. The interdisciplinary team from the fields of humanities, social sciences and computer science first worked together to develop research questions and find appropriate methods. The team then split up according to their language proficiency to analyse the data and collaborated intensively in order to exchange the individual expertise.

Research questions

The research questions focused on the identification of differences and similarities in parliamentary debates on the COVID-19 pandemic across Italy, Poland, Slovenia and the UK. To this end, the group first analysed the country-specific data, and then compared the results across countries as well as with the pre-COVID-19 period. They also mapped the COVID-19-related debates in time and compared them based on the epidemiological situation in each country.

COVID-19 in the parliaments

Given the force with which the pandemic swept through the countries, it is not surprising that the datasets exhibit high similarity when looking at the top 20 COVID-19-related keywords with respect to the pre-COVID-19 period for each country. The Figure below shows the semantic clusters (labelled manually) based on the keywords. Broadly speaking, two different concerns can be distinguished: the pandemic itself and its consequences (section on the left), and reaction to the pandemic and adoption of mitigation measures (section on the right).

Figure 1: t-SNE plot with perplexity 20 and exaggeration 2. Keywords are added manually.

To look at the characteristics of datasets further, the top 50 keywords were additionally analysed. The results show that, for all countries, the majority of keywords are COVID-19-related, while the others indicate other prominent subjects that were discussed in the parliament during this period (legislation related to defence and justice, infrastructure, voting system, foreign affairs, etc.).

It is somehow revealing that there are several keywords and collocations (economy, liquidity, recession, economic, crisis, fund, voucher …) which indicate that economy, rather than some other policy area, was the main concern of all the parliaments under investigation. Furthermore, among the four countries, mentions of EU financial support only appears on the lists for the Italian and Slovene parliament. Given the strong engagement of Brussels in the management of COVID-19 pandemic, this lack of EU-related mentions could, in a superficial manner, reflect the level of the relationship between the EU and the four countries. Another common aspect across datasets is that certain measures sparked high polarization. The Slovene data set, for example, shows that the prominent collocates of (tracking) application, vaccination, and quarantine consist of antonyms, such as obligatory:voluntary, control:freedom, scientific:thinking/believing. Similarly, the collocations from the Italian data set show some strong language revealing mutual accusation over the lack of wearing masks (negationists) and polarized opinions with regard to the use of the application (importance, failure). The same is true for the Polish data set, where pandemic collocates with, for example, fight, on the one hand, and alleged on the other hand; and where the emergency legislation is marked as anti-crisis by one side, and as leaky and so-called by the other side.

Networks of key terms

Collocation networks offer an insight in the relations between key terms in parliamentary debates. The collocation networks were used to acquire a bird’s-eye view over the semantics of the speeches that use the seed term “virus” in the first months of the pandemic. Strikingly, commonalities between countries appear from the networks. They are structured in multiple overarching themes. The first theme that stands out in several languages is the language related to crisis responses. The British network shows relations between, for example, NHS and testing. At this time, there was also gratitude and concern for those employed in the NHS, as seen by the collocates staff, worker, and nurse. From the Slovene network, the narrative of adopting measures [ukrepi] to restrict the spread [širjenje] of the virus in order to “secure life” shows a similar theme of crisis response. Besides the ad hoc measures that were being discussed in the parliaments, the networks also demonstrate the presence of a more forensic language, pertaining to the questions that surrounded the virus in the early months of 2020. The Italian network especially reflects this theme, with terms such as animal [animale], influenza [influenza], pathology [patologia] and bat [pipistrello]. Related questions on the mortality of the new virus also appear in both Italian, Polish and British networks.

Figure 2: Collocation network for VIRUS in March

Figure 3: Collocation network for VIRUS in June

COVID-19 cases and COVID-19 debates

Of the four countries analysed, the first coronavirus cases were found in Italy and in the United Kingdom, on 31 January 2020 (according to data from the Johns Hopkins COVID-19 Data Repository; Dong, Du, and Gardner 2020). In the British parliament, the first mentions of “coronavirus” on 22 January preceded first infections, and mentions of pandemic-related words increased before an uptick in infection numbers. In Italy the first debates of the coronavirus in the parliament coincided with the first surge of infections.

The first diagnosed cases of coronavirus infection in Poland and Slovenia were over a month later – on 4 and 5 March, respectively. In both countries mentions of pandemic-related keywords in the parliaments were over a week earlier than the first diagnosed cases. After the first wave of COVID-19 cases, the number of mentions of pandemic-related words declined in all countries.

All four countries saw a second wave of COVID-19 infections in the fall of 2020, but these increases in COVID cases were not always accompanied by proportional increases in the mentions of pandemic-related words in parliamentary debates. The share of pandemic-related words increased around the time of the second wave in Italy and Poland, but there was no clear increase in Slovenia and the UK. The reactions to the second wave are hard to compare across countries because of the differences in the coverage by parliamentary datasets.

Figure 4: Comparison of COVID-cases (orange line) with COVID-related debates (columns) in the UK

Limitations

Although parliamentary data is a rich source for textual analysis, it also comes with characteristic challenges. First, certain issues, though significant on a national scale, may not be discussed in parliament. For example, at the beginning of the pandemic, many emergency restrictions may have been enacted without going through the legislative process, e.g. via executive orders. A proper analysis of these data requires knowing the scope of parliamentary duties in each country. Additionally, parliaments generally go through periods of recess in which members do not meet and no discussion takes place. Because there are no data during those periods, there can be no analysis; whatever issues may have been of country-wide importance during those days or weeks are not reflected in our interpretations. Finally, the transcriptions of parliamentary proceedings do not always perfectly match what was really said, as transcribers may omit noises of hesitation or otherwise edit the speech of MPs, but given the focus of this research, such changes would probably not invalidate the analysis.

For a detailed description of the methodology used and further discussion, see the final output report.
To read the outputs of other groups, check the main DHH21 site.

Written by Isabella Calabretta, Courtney Dalton, Richard Griscom, Marta Kołczyńska, Kristina Pahor de Maiti, Ruben Ros

SSHOC Workshop Notes: ParlaMint – exploring societal issues through comparable corpora of parliamentary debates

News

SSHOC Announces New 2026 Leadership

SSHOC 2025 Updates

Science Clusters Position statement on operational commitment to EOSC and Open Research

SSHOC, the SSH Open Science Cluster has a New Chair and Vice-Chair in 2024

OSCARS project funded to foster the uptake of Open Science in Europe

SSHOC Workshop Notes: Par­laMint – exploring societal issues through com­par­able cor­pora of par­lia­ment­ary de­bates

News

SSHOC Workshop Notes: ParlaMint – exploring societal issues through comparable corpora of parliamentary debates