Center for Public Policy Studies

New research project funded by the Ministry of Science (2023-2026) – Big Data! “What Can Big Data Add to Academic Profession Studies? Theoretical models to practical applications”

A new research project funded by the Ministry of Science and Higher Education under the Science for Society II Program (2023-2026). The project is carried out by the team led by Prof. Marek Kwiek from the Institute for Advanced Studies (IAS) and the Center for Public Policy Studies (CPPS) at Adam Mickiewicz University in Poznań.

Project number: NdS-II/SP/0010/2023/01. Grant value: 750,000 PLN (approx. 200,000 USD).

“What Can Big Data Add to Academic Profession Studies? Theoretical models to practical applications”

The project combines social sciences and data science and assumes that new data sets and the possibilities of accessing them and their processing (digital era in higher education research and quantitative science research) create new opportunities for applied social research. Proposed research combining heterogeneous data sources at the scale of the Polish science system and at the scale of countries OECD research would have been impossible to carry out five years ago. In particular within the project will be global survey data (1 million surveys sent) can be combined with structured Big Data and combined survey data with bibliometric and administrative data (probabilistic methods and deterministic). The project proposes a radical change in the unit of analysis in the study of academic careers: not publication (and its characteristics), but the individual scientist (and his characteristics). A scientist has characteristics derived from integrated databases combining Big Data, national registers of scientists and institutional data on different levels. We are interested in aggregations at the level of scientists with individual characteristics, not at level of publications and their metadata. The research project is pioneering on a global scale: to conduct innovative staff research academic studies in Poland in the broad context of OECD countries, we use new methodologies and new sources primary data, posing new theoretical questions and seeking answers to traditional sociological questions academic careers based on the analysis of new empirical data generated by the project. Within project, we show the opportunities that open up for research in higher education and science in Poland structured Big Data (raw data from Scopus, Web of Science and POLON databases).

National higher education and research systems are at the center of public attention across Europe. Everyone wants to know more about them—and almost everyone wants to reform them. Academic staff represent the most significant cost of running academic institutions.

Changes in the nature of academic work are rapid, but for the first time they can be assessed in detail thanks to quantitative research. There is growing pressure to use large data sets (and a much larger number of observations) to draw relevant conclusions for science policy, including in Poland.

Big Data is data which is structured, cleaned, curated and updated on an ongoing basis at great expense, with a high level of reliability. In the case of the Polish sub-sample, Big Data is combined with administrative and biographical data from the POLON 2 system, which is almost ideal. In this project, Big Data is raw data from Scopus and Web of Science databases at the initial stage, available only to specialists, with a huge scope and a high degree of complexity (billions of separate cells).

The project makes particular use of combining huge survey data on researchers from the OECD area, including Poland, with bibliometric and administrative data. The use of Big Data allows for a balance between small-scale and large-scale research (with small and large N), which has a huge positive impact on higher education policy research.

The keyword of the project is complementarity (of data and methods): in the case of academic career research, Big Data accompanies global surveys and interviews, and macro-level research accompanies micro-level research. The project introduces large-scale research using global bibliometric databases, new software, and new analytical and visualization tools.

Poland is becoming almost transparent to the world—as a system, as individual institutions and their departments, as research groups, and finally as individual scientists. The era of visibility – and thus measurability – of all the most important research dimensions of university functioning has arrived, with far-reaching consequences for our universities. We are able to compare almost everything in an international context. However, from data to knowledge, we need to increasingly resort to the interpretation of the studied processes of science globalization.

Several factors increase the pressure to study academic staff using Big Data: (1) the availability of digital data on the inputs and outputs of scientific work at the individual level (funding, publications, research collaboration, mobility); (2) the availability of computing power to analyze huge data sets in the cloud; and (3) the pressure to provide the public and the scientific community with a quantified and data-driven picture of changes in higher education and among academic staff.

However, new data must be repurposed and they have their own limitations. Their volume and longitudinal nature (enabling analysis of changes in academic careers over time) open up new horizons, including, among others, the possibility of global perspectives alongside national ones.

From data sets that are huge and complex, we can extract useful information about researchers and their achievements, both past and present. We can examine vast amounts of data to discover patterns that would otherwise remain unnoticed; analyze outliers, deviations, and special cases; and conduct analyses based on an unprecedented number of observations. For example, our “Researchers in OECD Countries” database has 4 billion cells and contains 4 TB of structured data assigned to individual researchers.

Specific parts of structured, archived, and reliable Big Data (such as bibliometric datasets) can radically improve our understanding of how academic staff function. Different dimensions of academic work can be studied with increasing precision and at an extraordinary level of detail.

The use of well-prepared, extensive data sources allows us to study the academic profession over time, across countries (institutions, cities), across academic disciplines, at different levels of granularity, and in relation to research teams and individual researchers, their age, gender, and discipline.

The project introduces a new type of research: large-scale surveys (supported by bibliometric data) and research based on raw bibliometric data.

Nine thematic areas of the project:

(1) The role of international cooperation in science

(2) The role of high research productivity in science

(3) The role of sustained research productivity in terms of the entire scientific career

(4) The role of young scientists and women scientists in global science

(5) The role of indicators and metrics in academic careers

(6) The role of family and child-rearing in academic careers

(7) The role of individual publication profiles and their changes over time

(8) Tensions between national and global research

(9) Tensions between bibliometric, survey, and interview-based research

Nauka dla Społeczeństwa II - Centrum Zarządzania Projektami

Home - Fundusze - Ministerstwo Nauki i Szkolnictwa wyższego