0% found this document useful (0 votes)
99 views15 pages

AI as a Method of Scientific Invention

This article examines how artificial intelligence, particularly neural networks, is diffusing into science and influencing scientific discovery. It argues that neural networks meet the criteria of an emerging technology and can be considered a general method of invention. The article also discusses how AI is changing the scientific process and paradigm.

Uploaded by

johnwilliams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views15 pages

AI as a Method of Scientific Invention

This article examines how artificial intelligence, particularly neural networks, is diffusing into science and influencing scientific discovery. It argues that neural networks meet the criteria of an emerging technology and can be considered a general method of invention. The article also discusses how AI is changing the scientific process and paradigm.

Uploaded by

johnwilliams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Research Policy 51 (2022) 104604

Contents lists available at ScienceDirect

Research Policy
journal homepage: [Link]/locate/respol

This article forms part of the Special Issue on The Governance of AI

Artificial intelligence in science: An emerging general method


of invention☆
Stefano Bianchini *, Moritz Müller, Pierre Pelletier
BETA - Université de Strasbourg, France

A R T I C L E I N F O A B S T R A C T

Keywords: This paper offers insights into the diffusion and impact of artificial intelligence in science. More specifically, we
Artificial intelligence show that neural network-based technology meets the essential properties of emerging technologies in the sci­
Emerging technologies entific realm. It is novel, because it shows discontinuous innovations in the originating domain and is put to new
Method of invention
uses in many application domains; it is quick growing, its dimensions being subject to rapid change; it is
Scientific discovery
coherent, because it detaches from its technological parents, and integrates and is accepted in different scientific
Novelty
communities; and it has a prominent impact on scientific discovery, but a high degree of uncertainty and am­
biguity associated with this impact. Our findings suggest that intelligent machines diffuse in the sciences, reshape
the nature of the discovery process and affect the organization of science. We propose a new conceptual
framework that considers artificial intelligence as an emerging general method of invention and, on this basis, derive
its policy implications.

“In today's world, the magic of AI is everywhere – maybe it's not full AI but Recent advances in artificial intelligence (AI) – in particular the rapid
there are significant parts.” improvements in prediction achieved by (multi-layer) neural networks
Nils Nilsson (The Quest for Artificial Intelligence, 2009) (NN) – have brought a wave of optimism that these technologies will
speed up scientific discovery (Hey et al., 2009; Agrawal et al., 2018;
Cockburn et al., 2018). NN-based models have been found to be
1. Introduction
particularly good for discovering representations, invariances, and laws,
that is, unusual and interesting patterns that are hidden in high-
Measurable research outputs such as papers, patents, and in­
dimensional data (LeCun et al., 2015; Schmidhuber, 2015; Goodfellow
novations have been subject to high enduring growth rates over the last
et al., 2016). In other words, NNs have shown themselves to be partic­
century. Yet, recent empirical evidence suggests that research produc­
ularly suited to addressing scientific problems.
tivity is ever falling and new ideas are becoming increasingly harder to
The first question we raise in this article is whether NNs are, in fact,
find (Gordon, 2016; Bloom et al., 2020). A common narrative for this
diffusing into the sciences and, if so, what the mechanics of this diffusion
decline in productivity rests on the so-called ‘knowledge burden’. Over
process might be. In so doing, we consider five key attributes that allow
the past few decades, data and information have begun to grow and
a technology to be defined as ‘emerging’ – namely: (i) radical novelty,
accumulate on an unprecedented scale, and searching through an
(ii) fast growth, (iii) coherence, (iv) prominent impact, and (v) uncer­
increasingly vast and complex knowledge space has become prohibi­
tainty and ambiguity (Rotolo et al., 2015) – and show that NNs conform
tively expensive (Weitzman, 1998; Fleming, 2001; Jones, 2009).
to these properties.1


The research leading to the results of this paper has received financial support from the CNRS through the MITI interdisciplinary programs [reference: Artificial
intelligence in the science system (ARISE)] and the French National Research Agency [reference: DInnAMICS-ANR-18-CE26-0017-01].
* Corresponding author.
E-mail addresses: [Link]@[Link] (S. Bianchini), mueller@[Link] (M. Müller), [Link]@[Link] (P. Pelletier).
1
Rotolo et al. (2015) conceive of an emerging technology as “[a] radically novel and relatively fast growing technology characterized by a certain degree of coherence
persisting over time and with the potential to exert a considerable impact on the socio-economic domain(s) which is observed in terms of the composition of actors, institutions
and patterns of interactions among those, along with the associated knowledge production processes. Its most prominent impact, however, lies in the future and so in the
emergence phase is still somewhat uncertain and ambiguous” (p.1828).

[Link]
Received 30 August 2020; Received in revised form 13 May 2022; Accepted 22 July 2022
Available online 5 August 2022
0048-7333/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ([Link]
S. Bianchini et al. Research Policy 51 (2022) 104604

The second question we address is how NNs influence scientific Historically, the process of scientific inquiry has evolved through
discovery. Machines are becoming much more than mere scientific in­ paradigms, seen as symbolic generalizations, metaphysical commit­
struments, and might even be described as teammates. Today, intelli­ ments, values and exemplars that are shared by a community of scien­
gent machines can engage in various stages of a (complex) problem- tists and that guide the research of that community (Kuhn, 1962).
solving process. They can, for example, define the problem(s), identify For most of human history, scientists have been observing phe­
root causes, propose and evaluate solutions, choose between different nomena, postulating laws or principles to generalize the complexity of
options, make plans, take actions, and learn from interactions (Seeber their observations into simpler concepts – i.e., compressed, elegant
et al., 2020). AI and, in particular, multi-layer NNs have been qualified mathematical representations that offer insights into the functioning of
as a general-purpose invention in the method of invention (Cockburn et al., the universe. Originally there were just two sciences, the experimental
2018), a conceptual framework that blends the concepts of the method and the theoretical. Indeed, Hey et al. (2009) identify empirical obser­
of invention (MI) (Griliches, 1957) and general-purpose technology vation and logical (theory) formulation as the first and second scientific
(GPT) (Bresnahan and Trajtenberg, 1995). Building on this idea, paradigms, respectively. Towards the middle of the last century, how­
Agrawal et al. (2018) suggest that NN-based prediction machines can ever, many problems proved too complicated to be solved analytically
alter the knowledge production function in combinatorial-type research and researchers had to start simulating. Science entered a third para­
problems by affecting two dimensions: those of ‘search’ and ‘discovery’. digm, one characterized by the development of computational models and
NN ‘search’ methods would support knowledge access by making simulations to understand complex phenomena. As the knowledge
existing relevant knowledge available to the researcher, whereas NN frontier expands and the landscape gets more complex, it is becoming
‘discovery’ methods would help identify valuable combinations among harder and harder for researchers to know enough to find (useful)
elements of that available knowledge. Thus, in a needle-in-a-haystack combinations of knowledge that produce new (valuable) ideas.
problem, the ‘search’ dimension would arrange the haystack and the Ongoing developments in AI, especially the impressive achievements
‘discovery’ dimension would find the needle. made using NN techniques, have led to mounting pressure to shift from
This distinction between ‘search’ and ‘discovery’ is conceptually hypothesis-driven to data-driven scientific discovery. The emerging sci­
interesting. Yet, it tells us little about how AI influences the direction of entific paradigm is being built on data-intensive computing with the
knowledge development, because it only deals with one body (or one massive deployment of intelligent machines capable of finding repre­
haystack to stick with the analogy) of pre-existing elements of knowl­ sentations, rules, and patterns in an ever-increasing volume of struc­
edge. However, there are two sides to the knowledge explosion: tured and unstructured data (King et al., 2009; Hey et al., 2009; Nature
increasing knowledge within each domain (i.e., larger haystacks) and an Editorial, 2019). Even today, Francis Bacon's basic insight continues to
increasing number of domains (i.e., more haystacks). A priori, AI can hold: the scientists' job is to search for regularities in the empirical data.
either help scientists explore familiar conceptual spaces – structured Bacon probably could not have foreseen that this search is best achieved
styles of thought – in depth or transform the space by making unfamiliar today with the support of AI.
combinations of distant knowledge elements (Boden, 2004, 2009). The What makes NNs particularly powerful is the learning process, that
fundamental question, then, is whether AI is currently being used to is, they learn from past experience and understand the world in terms of
cope with the knowledge explosion within a domain or to facilitate a hierarchy of concepts, where each concept is defined by the way it
knowledge creation across domains – that is, in-depth exploration of a relates to simpler concepts (Schmidhuber, 2015; Goodfellow et al.,
known domain vis-à -vis the transformation of the domain through 2016). It is clear that the term ‘artificial neural networks’ has been
knowledge recombination across other domains. coined by analogy with biological neural networks, complete with their
Hence, we are interested in investigating empirically how NN neurons, connections and firings. In a general NN model, the variables
methods contribute to science in terms of recombinatorial novelty and observed in the data are presented to an input or visible layer composed
impact, an analysis confined here to the health sciences. In this study, the of several nodes; then a series of hidden layers (also composed of nodes)
concept of recombinatorial novelty refers to novel recombinations extracts increasingly abstract features from the data. The term ‘hidden’
across domains, as proxied by scientific journals, whereas the concept of stresses the idea that there is no predetermined structure; rather, it is the
impact refers to the relative importance of a study in the scientific model itself that learns which concepts are useful to explain the re­
community, as proxied by citation indices. We find that NN adoption is lationships observed in the data. The nodes in the input, the hidden and
negatively associated with recombinatorial novelty, suggesting that re­ output layers are all vaguely similar to biological neurons, and the
searchers are using NNs as a research tool primarily to cope with the connections between these nodes can be thought of as reflecting the
knowledge explosion within domains rather than across domains. connections between neurons (Hassabis et al., 2017).
Interestingly, our results also reveal a considerable degree of uncertainty NN-based methods can be applied in scientific settings in a variety of
as regards impact, reflected by a high variation in citation performance. ways (see, e.g., Raghu and Schmidt, 2020). The most common appli­
We suggest that this outcome is consistent with the intrinsic nature of cation is to use NNs to tackle complex prediction problems – i.e., mapping
emerging technologies, but also with a sort of ‘mode effect’ whereby inputs to predicted outputs. By way of example, the input might be an
‘everyone wants to be AI and data savvy, but few are ready’. MRI image and the machine has to output a prediction of whether there
The rest of this paper is structured as follows. Section 2 discusses the are any signs of cancer. A second common application is to obtain
emergence of the new data-intensive scientific paradigm; Section 3 interpretable insights into which property of the data led to the observed
presents the method for identifying NN-related research and our sample prediction – that is, from prediction to understanding. For example, some
construction; Section 4 documents aspects of the NN diffusion process in tools can be used to analyse the hidden representations of a neural
the sciences; Section 5 presents our analysis of the contribution of NN network and detect which features of the input are most critical. A third
methods to the health sciences; and, the final section concludes by application is to perform complex transformations of input data, such as
identifying a number of areas that might benefit from policy image super-resolution and data compression, which in turn make data
considerations. analysis easier and save space. Other recent tools, although in their in­
fancy, would help scientists write better papers and co-write codes.
2. Data-intensive scientific discovery It is clear that intelligent machines can help shoulder the ‘knowledge
burden’ within a scientific domain, act as a fertilizer of knowledge
“Few fields are untouched by the machine-learning revolution, from recombination across domains, and thus enrich and transform the
materials science to drug exploration; quantum physics to medicine.” knowledge space. In short, intelligent machines can influence both
Nature Editorial (2019) ‘search’ and ‘discovery’ processes.
In the case of the ‘search’ process, NNs can support access to

2
S. Bianchini et al. Research Policy 51 (2022) 104604

knowledge by predicting which elements of knowledge and information words (n-grams). We opted to retain only those terms consisting of
are most relevant to the researcher. Three examples will serve to illus­ multiple words – i.e., we removed all uni-grams – in order to err on the
trate this function. First, NN-based recommender systems can offer high side of conservativism and to ensure only the inclusion of terms that
quality cross-domain recommendations by exploiting numeric mea­ relate unambiguously to NNs. Moreover, we retained only the 30 most
surements, images, text and interactions in a unified joint framework frequent n-grams after eliminating terms considered as being too generic
(Zhang et al., 2019). Second, transformational learning can improve (e.g., ‘short term’ or ‘supervised learning’). The final list of search terms
learning tasks in one domain by using knowledge transferred from other used in our study is shown in Table 1.
(related) domains, and in turn capture generalizations and differences Our sample for subsequent analysis included all publications in the
across domains (Olier et al., 2021). And, third, AI can be used for fact- WoS Core Collection published between 1990 and 2018, and having at
checking, that is, assessing the veracity of scientific claims in sensitive least one of the search terms (Table 1) in their title, keywords, or ab­
areas such as climate change or Covid-19 pandemic (Wadden et al., stract. In total, we identified 260,459 documents (144,095 articles;
2020). 39,925 conference proceedings; 76,439 others).
In the case of the ‘discovery’ process, NNs provide a better prediction
of which elements of knowledge can be combined to produce new 4. Technology diffusion in the sciences
knowledge and of the value of that knowledge. Literature-based dis­
covery, for example, is a way to understand implicit (hidden) associa­ This Section documents the diffusion of NN-based methods in the
tions from existing studies, which can result in interesting, surprising, sciences. We show that the diffusion dynamics and the characteristics of
non-trivial hypotheses that are worth studying. Other NN-based tools, the technology largely conform to properties of emerging technologies.
such as machine reading comprehension systems, can propose variations
on an experiment after having identified gaps in the literature (Bar­
adaran et al., 2020). Highly efficient forms of deep active learning have 4.1. (Relative) fast growth
also been developed that can reduce the uncertainty associated with
those regions of the experiment space that are sparsely populated with One of the defining attributes of an emerging technology is the speed
results (Daugherty and Wilson, 2018). of its growth, which is evident in such dimensions as the number of
A major consequence of considering AI as a research tool – indeed, as actors involved, the funding made available, and the knowledge output
a teammate – is that its impact is not limited to its ability to reduce the produced.
costs of specific scientific activities, but that it can facilitate a new Our data confirm a ‘burst of research activity’ in all scientific areas
approach to science itself, by modifying the scientific paradigm in the (Fig. 1), although the volume (blue line) varied markedly. ‘Technology’
domains where the new research tool is deployed. Exploring the emer­ (Panel A) is the dominant field, which can be explained in part by the
gence of NN-based technology in science and its impact on scientific fact that it includes ‘Computer Science’, the main field of origin. It is
discovery is at the core of our study. followed, with about five times fewer papers, by ‘Physical Sciences’
(Panel B), which in turn is closely followed by ‘Life Sciences &
3. Identifying neural network research
Table 1
Our empirical analysis of scientific publications exploits two data­ NN-related search terms from word embedding.
bases: [Link] and Web of Science (WoS). First, we use [Link] to n-gram Count
draw up an appropriate list of search terms referring to NNs based on the neural network 402,996
natural language processing of scientific abstracts from publications in neural networks 173,470
the subject areas of ‘Computer Science’, ‘Mathematics’, and ‘Statistics’. artificial neural 100,749
artificial neural network 99,794
Second, these search terms are used to query the WoS database and to
deep learning 24,104
extract a sample of NN papers across all scientific fields. convolutional neural 20,742
Reliance on a list of search terms for document retrieval is a common convolutional neural network 20,595
practice in research on emerging technologies and science in general. recurrent neural 14,355
Unfortunately, extant studies do not provide us with an authoritative recurrent neural network 13,965
deep neural 9418
‘ready-to-use’ list of search terms. Here, we train the word embedding
multilayer perceptron 9352
model Word2Vec (Mikolov et al., 2013a, 2013b) with scientific abstracts deep neural network 9181
from [Link]'s documents in order to learn NN-related terms. hidden layer 7810
Our training sample consists of scientific abstracts from [Link]. AI deep convolutional 4263
research tends to be a blend of statistics and informatics, but is devel­ deep convolutional neural network 3384
long short term memory 3122
oped in the main within the computer sciences. Informatics is a fast- hidden layers 2080
developing field in which conference proceedings traditionally play an restricted boltzmann 1635
important role. More recently, however, the rapid dissemination of auto encoder 1444
research is (best) achieved via open access journals and platforms. Of generative adversarial 1242
encoder decoder 1198
these, [Link] is the most prominent and provides us with a rich corpus
adversarial network 1192
for the identification of NN-related terms. We downloaded a total of generative adversarial network 1085
197,439 abstracts of papers from the subject areas of ‘Computer Sci­ fully convolutional network 688
ence’, ‘Mathematics’ and ‘Statistics’, for the period 1990–2018. The convolutional layers 568
three areas represented roughly 50 % of all [Link] documents in variational autoencoder 216
adversarial attacks 197
2018, and just 10 % in the early 2000s. adversarial examples 92
Once pre-processed (details in Supplementary material), the corpus variational autoencoders 75
was used to train the Word2Vec model in its skip-gram with negative adversarial perturbations 24
sampling version. The main outcome of this model is one vector repre­ Notes: The count refers to how many times a given term occurs in the
sentation for each term in the vocabulary. Hence, we were able to Web of Science corpus. A document may (and very often does) include
identify the terms that appear in the same cluster as the term ‘neural several terms. Adding more terms would only slightly change the
network’. The resulting list of potential search terms included individual number of documents retrieved from WoS, as can be seen from the
words (uni-grams) as well as technical terms consisting of multiple counts of the last few terms.

3
S. Bianchini et al. Research Policy 51 (2022) 104604

Fig. 1. Trends in NN publication activity by scientific area.


Notes: The blue lines show the number of publications and the orange lines plot the growth rates in each scientific area. Growth rates are calculated as three-year
moving averages and omitted publications before 2001. Scientific areas correspond to WoS research areas. Panel H refers to research published on [Link], based on
the sample discussed in Section 3.

Biomedicine’ (Panel C). Publications in ‘Health Sciences’ (Panel D) – 2017).


defined as a subset of ‘Life Sciences & Biomedicine’ and the focus of the Research output increased not only in absolute numbers but also
analysis conducted in the next Section – largely parallel those of ‘Life relative to the overall number of papers in a given scientific area, albeit
Sciences’. Publication counts in ‘Social Sciences’ (Panel E) are relatively at a lower level. In 2018, NN documents represented 2.6 % of all papers
low, becoming negligible for ‘Arts & Humanities’ (Panel F). Panel G, in ‘Technology’, 1.02 % in ‘Physical Sciences’, and 0.3 % in ‘Life Sci­
which combines all WoS documents into one, shows that the (three-year ences & Biomedicine’. This means NN publications still account for only
average) growth rate (orange line) in NN publication activity around a tiny fraction of the whole research volume, in particular in application
2005 was high (at about 10 %), it then suffered something of a decline in domains. However, recent growth rates in these shares are remarkable.
the years around 2010, before recovering and experiencing steady NN-related research presents the highest growth rates in the ‘Life Sci­
growth to the end of the observation period (reaching 20 %). Indeed, the ences & Biomedicine’ (47.3 % in the period 2017–2018), ranks second in
individual areas exhibited very similar growth patterns.2 ‘Physical Sciences’ (42 %), and in ‘Technology’ presents a growth rate of
Publication activity on [Link] (Panel H) follows essentially the roughly 18 %.
same dynamics. Growth rates mimic the shape described above but are
about five times higher than those in the WoS panels. The comparatively
higher rates are attributable it would seem to the fact that open plat­ 4.2. Spatial diffusion and actor re-configuration
forms are increasingly popular, given their efficiency and speed, as a
channel of communication between researchers, particularly within the Another of the defining attributes of an emerging technology is the
machine learning and computer science communities (Sutton and Gong, speed of change in the configuration of actors – e.g., countries, users,
and scientists.
Fig. 2 shows the dynamics of science at the country level. Each
2
document is attributed to a given country when the affiliation of at least
The overall number of NN-related documents varies according to the sub-
one of its authors is in that country. During the first period, 1990–1999,
disciplines within each scientific area (not shown here). The general trend in
most of the documents (about 5000) were published by scientists in the
‘Technology’ is driven mainly by ‘Computer Science’ (103,729 documents),
‘Engineering’ (95,638) and ‘Automation & Control Systems’ (24,721). In the United States. Publishing activity was relatively low in absolute
case of ‘Physical Sciences’, it is driven by ‘Physics’ (7239), ‘Mathematics’ numbers in the European countries, Australia and China, and negligible
(5123) and ‘Chemistry’ (3702), while in ‘Life Sciences & Biomedicine’, it is or non-existent in most other countries. In the following decade,
driven by ‘Environmental Sciences & Ecology’ (2632), ‘Neurosciences & 2000–2009, China became the most prolific country with about 20,000
Neurology’ (2032), and ‘Biochemistry & Molecular Biology’ (1728). documents. The US ranked second with around 14,000 articles, whereas

4
S. Bianchini et al. Research Policy 51 (2022) 104604

Fig. 2. Global diffusion of NN in science across countries.


Notes: The intensity of colour reflects a country's relative number of NN publications in a given period, with no observed NN publication activity in hatched countries
[WoS sample].

European countries, Australia, Canada, and India grew sufficiently to problems.


maintain their relative strength in the field. Interestingly, in this decade, The diffusion of emerging technologies from the originating domain
NN research activity took off in an increasing number of countries. These to the application domains typically follows a ‘double-boom’ cycle
trends were further reinforced in the last period, 2010–2018. Compared (Schmoch, 2007). Initially, the new technology seems to be of high
to the previous decade, China doubled its research output, widening the potential, and high expectations trigger considerable development ef­
gap with the US and, to a lesser extent, with the EU. forts, especially theoretical – the first boom. However, during these early
In summary, our data seem to suggest that NN research has diffused development activities, several actors discover the difficulties of trans­
rapidly at the global scale, and that since the early stages of development lating theory into practice. Most fail and cease their innovation activ­
there has been a re-configuration of global actors. We consistently ities, putting an end to the first boom. But some continue and, as time
observed high volatility in the rankings, with some countries climbing passes, they overcome some of the more important practical hurdles and
the ladder and others lagging behind. are able to demonstrate genuine advances – starting the second boom.
Interestingly, this pattern is largely consistent with the growth patterns
recorded in Fig. 1 (orange lines), where the first boom, subsequent
4.3. Radical novelty and ‘double-boom’ cycle
decline, and second boom are clearly evident.
We also find that the second boom is marked by a shift in emphasis
NNs have experienced a discontinuous wave of major innovations,
from theoretical principles to practical applications. In support of this
which points to the radical nature of this technology. (Artificial intelli­
evidence, we considered the top five cited references in each year of the
gence has a long, rich history dating back to the 1950s, when researchers
observation period (i.e., those documents with the highest annual shares
from different domains began to explore various paths towards mech­
of all cited references in our publications), which gave us a list of 18
anizing intelligence – interested readers may consult Nilsson (2009) and
unique articles and their corresponding citation counts, as shown in
Russell and Norvig (2021 – Ch.1 and 23).)
Table 2. Using dynamic time warping (DTW) to measure dissimilarity
Novelty can also arise from putting the technology to a new use – that
between time series, we then clustered these temporal sequences by
is, applying it from one domain to another (Adner and Levinthal, 2002).
means of k-medoids (Berndt and Clifford, 1994). As shown in Fig. 3, we
The originating domain of NN research is predominately computer sci­
obtained two clusters. In the first period, the most cited articles in our
ence; thus, it seems appropriate to follow Cockburn et al. (2018) and
sample were theoretical contributions, including a discussion of the
assume that NN publications in all areas other than computer science
possibility of using multilayer feedforward networks as universal
represent applications of NN methods to address field-specific research

5
S. Bianchini et al. Research Policy 51 (2022) 104604

Table 2 function approximators, training algorithms (backprop), and parallel


Influential NN publications. computing theories (cellular NN). In the second period, the most influ­
Title | journal Cluster # Share ential articles were no longer theoretical contributions, but rather arti­
Citations [%] cles that show how to put theoretical principles into practice. These
Multilayer feedforward networks are universal 1 5904 0.14 contributions included inventions that have brought enormous perfor­
approximators | NN mance gains to real-world tasks, especially for image and text analyses
Neural networks and physical systems with 1 4658 0.11 (e.g., deep convolutional neural networks and long short-term memory
emergent … | PNAS (LSTM) architectures).
Learning representations by back-propagating 1 4645 0.11
errors | Nature
Learning internal representations by error 1 3921 0.09 4.4. Coherence
propagation | MIT Press
Approximation by superpositions of a sigmoidal 1 3657 0.09
function | MCSS Another defining attribute of an emerging technology is its coher­
Training feedforward networks with the 1 3128 0.07 ence, understood as the shared interpretation and acceptance of the
Marquardt algorithm | IEEE TNNLS technology within a community. Signals of coherence can include the
ANFIS: adaptive-network-based fuzzy inference 1 2909 0.07
creation of dedicated conference sessions, new specialist journals and
system | IEEE SMC
Identification and control of dynamical systems 1 2551 0.06 new categories in established classification systems. Here, we consider
using … | IEEE TNNLS the transition from cross-disciplinary to disciplinary research effort as a
Cellular neural networks: theory | IEEE CAS 1 2267 0.05 sign of coherence, as this would mean that the technology has moved
ImageNet classification with deep 2 7177 0.17 beyond its conceptual stage requiring close interaction between users
convolutional neural networks | NeurIPS
and developers, and has become ‘common practice’ in application
Gradient-based learning applied to document 2 3590 0.09
recognition | IEEE Proceedings domains.
Deep learning | Nature 2 3542 0.08 Each document is labelled by WoS as belonging to at least one subject
Long short-term memory | NC 2 3074 0.07 category on the basis of the journal in which it was published. In most
A fast learning algorithm for deep belief nets | 2 2710 0.06
instances, a document falls into more than one category. The extent to
NC
Reducing the dimensionality of data with neural 2 2621 0.06 which publications in a given scientific area are cross-classified as
networks | Science computer science contributions can therefore proxy cross-disciplinarity
Very deep convolutional networks for large- 2 2582 0.06 with respect to computer science. Thus, for each broad scientific area
scale image recognition | arXiv and year, we calculated the fraction of NN documents that are (also)
Particle swarm optimization | IEEE Proceedings 2 2568 0.06
labelled as ‘Computer Science’.
ICNN
Deep residual learning for image recognition | 2 2160 0.05 Fig. 4 shows the corresponding time trends. Each point of the plot for
IEEE Proceedings CVPR ‘Technology’ (Panel A) represents the average number of ‘Technology’
NN documents cross-classified as ‘Computer Science’ in a given year. For
Notes: This table reports the references (title and journal) of the most cited ar­
ticles from the WoS publication sample over the period 2000–2018. From a total example, in 1990 about 60 % of ‘Technology’ publications also fell into
of 4,190,306 references (1,618,836 unique) cited by the documents in our the ‘Computer Science’ category (first dot). The overall trend (blue line)
sample, we selected the five most used references for each year. This gives us 18 follows a flat U-shape that reaches around 70 % in 2005, before falling to
time series that were clustered. Clustering is obtained via k-medoid and dynamic less than 50 % by the end of the observation period. Indeed, in 2018, a
time warping. References within clusters ranked by total number of citations. large proportion of papers in ‘Technology’ are no longer labelled as
computer science contributions. ‘Physical Sciences’ (Panel B) also

Fig. 3. Trends in annual citations of influential NN publications.


Notes: This figure shows the annual share of all citations in the Web of Science sample for the two clusters of most cited NN articles. The shaded areas are time series
intervals defined by minimum and maximum citation shares. In the main, the orange profile represents ‘theoretical’ contributions and the blue profile represents
‘applications’. Due to the limited number of articles that could be cited in the initial period, we clustered the time series from 2000 onwards.

6
S. Bianchini et al. Research Policy 51 (2022) 104604

Fig. 4. NN publications cross-classified as ‘Computer Science’.


Notes: The figures show the fraction of NN documents cross-classified as ‘Computer Science’. Orange dots represent the share of cross-classified papers in each year.
The blue curve corresponds to a simple local regression, with the surrounding shaded area representing the 95 % confidence interval around the mean.

presents an inverse U-shape, with an increase in cross-classified com­ 5. Neural networks in the health sciences
puter science documents that reached 20 % in 2000, before falling to 10
% by the end of the period. No increase in computer science cross- Here, we specifically address the impact of NN-based methods in the
classification was observed in ‘Life Sciences & Biomedicine’ (Panel C). ‘Health Sciences’, one of the application domains with the highest short-
From the very high share of 70 % at the beginning of the period, a term societal impacts (Raghupathi and Raghupathi, 2014; Miotto et al.,
continuous decline was subsequently recorded (with significant drops 2018). AI, in general, and deep learning, in particular, have already
around 2000 and again in 2010), finishing the period at around 20 %. contributed to a variety of data-driven innovations in the health domain
‘Health Sciences’ (Panel D) presents the same evolution. ‘Social Sci­ – improving healthcare systems, supporting clinicians, and monitoring
ences’ (Panel E) increased their share of computer science documents to patient diseases, among others. A review of the literature enabled us to
40 % around 2010, but this was followed by a sharp downturn, while in identify applications in virtually all sub-disciplines: health informatics
‘Arts & Humanities’ (Panel F), the share of computer science documents and biomedical research (Marx, 2013; Ravì et al., 2017), computational
is very noisy, and no particular trend can be deciphered. biology (Angermueller et al., 2016), genomic medicine (Leung et al.,
Taken together, these dynamics suggest that NNs diffuse from com­ 2015), medical imaging (Litjens et al., 2017; Shen et al., 2017; Savadjiev
puter science, the originating discipline, into other application-oriented et al., 2019), drug discovery and pharmacogenomics (Ma et al., 2015),
scientific disciplines. Thus, over time, we see a greater propensity of real-time patient monitoring (Rajkomar et al., 2018), public health
different communities to integrate the technology into their discipline, (Miotto et al., 2018; Zhang et al., 2018), and neuroscience and the
which is a good signal of coherence. cognitive sciences (Marblestone et al., 2016; Hassabis et al., 2017; Lake
In short, it is more than apparent that NN technology fulfils many of et al., 2017).3
the conditions to be classified as an emerging technology. It exhibits
rapid growth in all domains; it has experienced a turbulent shift and 5.1. Novelty and impact in science
reconfiguration of the actors involved in its development and adoption;
and it presents a degree of coherence that persists over time. However, A ‘scientific contribution’ is typically considered as comprising two
the picture arrived at in the first part of this analysis is incomplete. How elements: novelty and impact. Different terms for essentially this same
does the technology influence scientific discovery in its domains of
application? What can be learned about its undoubted impact yet, at the
same time, the uncertainty that is often associated with its adoption? We 3
We define the ‘Health Sciences’ as comprising 83 Web of Science subject
address these questions in the next Section. categories within the ‘Life Sciences & Biomedicine’ research area. The complete
list of categories included can be consulted in the Supplementary material.

7
S. Bianchini et al. Research Policy 51 (2022) 104604

idea were used in earlier studies of science, so that debates centred on we operationalize this concept as the subsequent use made of a paper,
discussions of the notions of originality, discovery and breakthrough and measured by the number of citations received.
contributions to scientific progress (de Solla Price, 1963; Merton, 1957;
Bourdieu, 1975). It was Kuhn (1962) who coined the term ‘novelty’ to 5.2.1. Sample
describe a more radical contribution that does not simply make an in­
cremental advance in the ‘normal science’ in place, but rather breaks the We include all the articles for the whole observation period
current paradigm. More recently, the term novelty has partly lost this (2000–2018) published in those health journals where research
radical connotation, but it still carries the idea of a high degree of involving NNs has been most prominent. This provides us with a rela­
originality, while the concept of ‘recombinatorial novelty’ has emerged tively coherent knowledge base against which we can examine the
to highlight the idea that new knowledge arises out of the recombination concepts of novelty and impact. In total, we identified 26,461 NN health
of previously generated bits of knowledge (Fleming, 2001; Arthur, 2009; papers in about 5000 health journals and proceedings. Roughly 45 %
Uzzi et al., 2013; Wang et al., 2017). (11,520) of these documents are published in the top 100 health journals
Only a very small percentage of the potential for useful re­ in the sample. Hence, we downloaded the entirety of these journals for
combinations in the knowledge space is currently exploited. NNs can the period 1990–2018. Our final sample, combining NN and non-NN
change the way science develops by helping to overcome our human publications, contains 1,081,223 articles.
limitations (Agrawal et al., 2018; Cockburn et al., 2018; Furman and
Teodoridis, 2020). Yet, how exactly does NN adoption correlate with 5.2.2. Variables
novelty? The answer to this question depends very much on how the
technology is used in the scientific complex. Indeed, scientists can adopt Our main explanatory variable is a binary indicator of a paper's NN
new methods either to advance well-established research trajectories content: 1 if the paper involves the use of NN methods, 0 otherwise. Our
within a conceptual space or to explore new avenues by altering the main dependent variables are (various measures of) recombinatorial
conceptual space with knowledge from other domains, leading to low novelty and scientific impact based on citation counts.
and high recombinatorial novelty, respectively. Recombinatorial novelty is measured in relation to the journals
The second element of a ‘scientific contribution’ concerns its impact, referenced by a paper. Thus, each paper is examined to determine
a key attribute of emerging technologies. Impact is related to, but whether it makes ‘first-time-ever’ combinations of referenced journals –
different from, novelty; if research provides novelty, that novelty must i.e., its list of references contains journal pairs that have never previ­
be adopted by the scientific community in order for its impact to be felt. ously appeared jointly in any list of references. In order to exclude
And, moreover, research can have an impact on subsequent research for journal pairs that simply formed once by happenstance, we further
reasons other than (recombinatorial) novelty, especially when providing impose the condition that journal pairs be observed again within the
new insights within established knowledge structures. next three years. A paper with at least one journal pair in the reference
Yet, nor should impact be considered fully independent of novelty. list that is both novel and that has been re-used, is considered as
Evidence suggests that a high degree of novelty is likely to increase the providing some novelty. Thus, we construct a binary indicator of nov­
risk of delays and failures (Azoulay et al., 2011). Moreover, novel elty, henceforth referred to as Novelty Dummy. A further consideration is
research often requires more complex and risky collaborative social that a novel journal pair may span domains that vary in their distance
structures (Fleming et al., 2007; Foster et al., 2015). Thus, highly novel one from another (i.e., more or less distant). This subtlety is captured
research can be subject to considerable variations in ‘quality’ (Fleming, through the co-citation profiles of the two journals forming a novel pair.
2001; Wang et al., 2017) and, hence, to greater variations in impact. The idea is that if both journals are often (rarely) cited with the same
Uncertainty and ambiguity are common features of the research process, third journal(s), they are likely to span less (more) distant domains. In
especially because the potential applications of the technology have yet this way, we are able to construct a distance-weighted (continuous)
to be explored and understood. Social inertia can further reinforce the measure of novelty, henceforth referred to as Novelty.
uncertainty associated with impact. Emerging technologies typically Calculations of the binary and weighted novelty measures follow
encounter resistance in society precisely because they cause structural Wang et al. (2017). However, our procedure differs in two major re­
changes in roles and norms (Merton, 1957; Bourdieu, 1975). This is spects. First, we judge novelty and co-citation distance only on journal
particularly true of AI which operates at the intersection of ethical and pairs that are observed in the reference lists of our sampled papers. Thus,
legal considerations and, as such, is shaping the future of both in­ we do not measure novelty per se but rather with respect to a knowledge
dividuals and society as a whole (Lanier, 2010; O'Neil, 2016; Zuboff, base covered by the sampled health journals.
2019). Second, we calculate different measures of novelty by considering
different sets of journals in the references. In this way, we are able to
5.2. Empirical analysis capture the source of novelty – i.e., where does this novelty come from?
ICT, health, or other domains? While it is true that all the articles in our
We measure scientific knowledge creation in scientific papers pub­ sample are published in outlets of the ‘Health Sciences’, they can
lished in peer-reviewed journals and conference proceedings in the reference journals in various domains. For instance, a health science
‘Health Sciences’. Henceforth, the term ‘journal’ is used to refer inter­ paper involving NNs is likely to cite computer science journals where the
changeably to both peer-reviewed scientific journals and conference NN methods were first published. This translates into a recombinatorial
proceedings. We restrict our focus to journals that are not cross- novelty ‘simply’ because of the adoption of the method, but it does not
classified as ‘Computer Science’ journals, ensuring that publications necessarily reflect the recombinatorial potential of NNs to connect and
include NN methods as a research tool. recombine knowledge in complex knowledge landscapes. In other
Our approach is to compare publications that involve NNs with those words, we seek to measure whether NN adoption fosters novel re­
that do not involve NNs, while controlling for a set of confounding combinations within the health sciences and/or between the health sci­
factors. Comparisons are made in terms of their recombinatorial novelty ences and disciplines other than the computer sciences. Thus, we
and scientific impact. For the main analysis, we operationalize the calculate novelty not only in journal pairs, as indicated by ‘All Sciences’,
concept of ‘recombinatorial novelty’ as the first appearance of a but also limited to journal pairs where (i) no referenced journal is
knowledge combination, very much in line with Wang et al. (2017), the classified as a computer science journal, indicated by ‘No CS’; and (ii)
details of which we describe below. Novelty à la Wang et al. complies both referenced journals are uniquely classified as health sciences,
with the idea of NNs as a method of invention – i.e., a method for indicated by ‘Only HS’. By way of example, the combination of ‘Biology
creating something new and valuable. In the case of ‘scientific impact’, & Biochemistry’ and ‘Computer Science’ journals can be regarded as an

8
S. Bianchini et al. Research Policy 51 (2022) 104604

‘All Sciences’ combination; ‘Engineering’ and ‘Molecular Biology & In all estimations, we include the control variables discussed above
Genetics’ as a ‘No CS’ combination; and ‘Neuroscience & Behaviour’ and and a set of dummies to control for scientific field and cohort effects. We
‘Psychiatry/Psychology’ as an intra-domain ‘Only HS’ combination. proxy scientific field using WoS categories (field WC). As a paper may
Combining these three recombinatorial options with the possibility fall into several categories, we code dummy variables taking a value of 1
of calculating novelty as either a binary indicator or a continuous score, for each category. Throughout the analysis, robust standard errors
we obtain six different novelty measures, namely: Novelty Dummy (All clustered at the journal-level are obtained via bootstrapping all journals.
Sciences), Novelty Dummy (No CS), Novelty Dummy (Only HS), Novelty
(All Sciences), Novelty (No CS), and Novelty (Only HS). 5.2.4. Results
Impact is measured by the number of citations (# Citations) received
by a paper from its year of publication up to 2019, the time of data Table 3, Columns 1–3, shows the Tobit regressions of the continuous
extraction. Furthermore, we code dummy indicators for so-called ‘big measures of novelty, Novelty. Columns 4–6 report the Probit estimates of
hit’ contributions – i.e., highly cited papers. Whether a paper is among the binary novelty indicators, Novelty Dummy.
the top 5 or 10 % cited papers (Top 5 % Cited and Top 10 % Cited) is When considering recombinatorial novelty across all sciences (Col­
calculated with reference to other papers published in the same year and umn 1), the estimated coefficient is positive but non-significant, but
falling in the same WoS subject category. when we exclude computer science references (Column 2) the coeffi­
We consider a set of control variables to capture various character­ cient becomes negative yet remains non-significant. Restricting refer­
istics of a focal paper. We control for the number of references made by a ences to health sciences only (Column 3) increases the negative
paper (# References) as this might automatically increase the likelihood coefficient, which is now significant below the 1 % significance level.
of its having new combinations. In prior research, the number of authors The same pattern is observed when we consider the results of the Probit
has been shown to be positively associated with both novelty and regression of the novelty dummy.
impact, hence we control for that (# Authors). The adoption of AI in To what extent does the adoption of NN methods change our ex­
scientific settings can indeed have an ambiguous effect on team size. Size pectations of recombinatorial novelty in the health sciences? To a
may increase as new members are needed to manage the technology (at considerable degree, given that adopting NN decreases the degree of
least in the early stages), but the technology may also automatize some novelty by 18.6 %. In addition, the marginal effects of Probit (Column 6)
tasks, thereby generating a replacement effect in the scientific work­ tell us that, for the median observation, NN decreases by 0.031 the
force. International collaborations may also be a source of novelty and probability of an article being novel (0.037 for the average observation).
impact, and may be instrumental in the adoption of the technology. We In sum, NN adoption is not significantly correlated with novel re­
proxy international collaboration by a dummy (International Collab.) combinations across the entire knowledge landscape, nor with novel
taking a value of 1 if there are at least two different countries in the recombinations involving anything other than computer sciences. Yet, it
authors' affiliations, 0 otherwise. For the same reason, we construct a is significantly and negatively correlated with novel recombinations
dummy for private sector participation (Private Partic.) taking a value of within the health sciences. These findings suggest that NN methods tend
1 if the paper has at least one non-university affiliation in the list. We to be adopted as part of a ‘balancing strategy’ in which the risk associ­
consider the journal impact factor (JIF), since, on the one hand, high ated with the (emerging) technology is counterbalanced by keeping the
impact journals may be biased against novelty, but, on the other, in­ knowledge landscape stable. Another way of interpreting this outcome is
crease visibility and hence citations. We additionally control for the that NNs are employed mainly as a research tool to support already
journal age (Journal Age). Finally, we include a dummy indicating formalized and well-defined research trajectories in the health sciences
whether the paper provides a review or survey of extant literature community. This evidence is consistent with the idea of extending sci­
(Survey). A survey may in fact cover separate streams of research ence while maintaining the advantages of conventional domain-level
without really connecting them.4 Descriptive statistics of the variables thinking (Boden, 2004; Uzzi et al., 2013).
are reported in Appendix A. Our estimates of the control variables echo previous research. Larger
teams are associated with more novelty (Fleming et al., 2007; Lee et al.,
5.2.3. Estimation methods 2015); international collaborations are negatively associated with nov­
elty (Wagner et al., 2019); the chances of providing a new combination
We model three different types of outcome: (i) binary indicators of of journal references increase with the number of references (Wang
novelty and impact, (ii) positive continuous measures of novelty, and et al., 2017); and, literature reviews also tend to draw from a wider
(iii) positive discrete measures of impact (number of received citations). range of sources leading to novel combinations of references. We find a
Each type of outcome requires a specific econometric setting. negative effect of private involvement and, finally, a journal's age and
All binary indicators are modelled with a Probit. Our continuous impact factor seem to play no role.
novelty measure is censored at zero, hence we use a Tobit model. Ci­ How does NN adoption correlate with impact? Table 4, Column 1,
tations are count data for which the Poisson and negative binomial shows the results of the negative binomial regression of citation counts.
models are natural candidates. Over-dispersion and the conditional Here, the mean and dispersion parameters may vary with various right-
mean of the outcome variable being much lower than its variance are the hand side factors.5 We find that NN adoption positively and significantly
most common arguments for favouring the negative binomial over the affects the number of citations received, both in terms of expectation
Poisson model. In our case, both empirical arguments hold; therefore, and variance. Compared to non-NN papers, ceteris paribus, NN papers
we opted for the negative binomial to model mean and dispersion receive on average 10.32 % more citations. The expectation of citation
separately, each with a linear predictor incorporating our main left-hand count increases by a median of 6.01 for NN research. The dispersion of
side variables and controls. the citation distribution is 19.57 % higher for NN papers than for non-
NN papers.
The Probit regressions used to model the probability of a paper
4
falling in the right tail (top 5 % or 10 %) of the year–field citation
Private Partic. takes a value of 1 if we detect in the authors' affiliation at
least one of the acronyms present in the Wikipedia page: ‘List of legal entity
types by country’. We use the SCImago Journal Rank to obtain the impact factor
5
(JIF) for each journal in each year. Journal Age is calculated as the time elapsed We excluded dummy variables other than NN to model the dispersion of
from the date of the journal's creation to the year of publication. Survey takes a citations because these variables caused problems with the convergence of the
value of 1 if we detect in the title of the paper the terms ‘Survey’, ‘Overview’ or maximum likelihood estimator. In modelling the dispersion, we also tried
‘Review’. simpler specifications by progressively incorporating a few variables at a time.

9
S. Bianchini et al. Research Policy 51 (2022) 104604

Table 3
Novelty profile of NN publications.
Tobit: Novelty Probit: Novelty Dummy

All Sciences No CS Only HS All Sciences No CS Only HS


(1) (2) (3) (4) (5) (6)

NN 0.044 − 0.031 − 0.186*** 0.053 − 0.008 − 0.150***


(0.038) (0.034) (0.040) (0.037) (0.033) (0.037)
# References (log) 1.046*** 1.050*** 1.029*** 0.878*** 0.879*** 0.843***
(0.033) (0.033) (0.033) (0.026) (0.026) (0.023)
# Authors (log) 0.177*** 0.184*** 0.227*** 0.184*** 0.189*** 0.223***
(0.021) (0.022) (0.024) (0.020) (0.020) (0.022)
International Collab. − 0.053*** − 0.058*** − 0.084*** − 0.050*** − 0.054*** − 0.076***
(0.010) (0.010) (0.010) (0.009) (0.010) (0.009)
Private Partic. − 0.004 − 0.004 − 0.027* − 0.007 − 0.008 − 0.026**
(0.012) (0.012) (0.014) (0.012) (0.013) (0.013)
JIF − 0.026 − 0.024 − 0.017 − 0.025 − 0.024 − 0.017
(0.019) (0.019) (0.021) (0.017) (0.017) (0.018)
Journal Age (log) − 0.098 − 0.082 − 0.044 − 0.074 − 0.061 − 0.030
(0.099) (0.100) (0.108) (0.090) (0.090) (0.095)
Survey 0.225*** 0.216*** 0.181*** 0.206*** 0.199*** 0.163***
(0.049) (0.047) (0.050) (0.049) (0.047) (0.046)
Log likelihood − 263,098 − 258,255 − 221,241 − 180,701 − 178,639 − 161,710
χ2 [null model] 96,074*** 94,950*** 77,374.6*** 75,936*** 75,187*** 64,730***
χ2 [w/o NN model] 4.90* 2.20 60.90*** 6.70** 0.10 44.60***
# obs 356,037 356,037 356,037 356,037 356,037 356,037

Notes: This table reports coefficients of the effect of NN methods (NN, dummy) on recombinatorial novelty built by considering different knowledge landscapes.
Bootstrapped (500 replications) standard errors clustered at the journal-level in parentheses: ***, ** and * indicate significance at the 1 %, 5 % and 10 % levels,
respectively. The effect of NN on the positive continuous novelty measure is estimated using a Tobit regression (Columns 1–3). The effect on the novelty dummy is
estimated using a Probit (Columns 4–6). Each novelty measure is calculated on three different sets of journal references: ‘All Sciences’ – All cited journals, ‘No CS’ – All
cited journals except for computer science journals, and ‘Only HS’ – Only citations to health science journals. Constant term, scientific field (WoS subject category) and
time fixed effects are incorporated in all model specifications. Likelihood-ratio tests are used to compare the goodness-of-fit of two statistical models: (i) null model
against complete model; (ii) model without the NN variable against the complete model.

distribution corroborate the results. The marginal effects suggest that Second, whether or not AI can be classified as a general-purpose tech­
research involving NN has a 0.019 (median value) higher probability of nology remains open to debate and we find more arguments to support
being in the top 10 % of the most influential contributions (0.027 mean the contention that AI is better considered, for example, as a large
value), and a 0.009 higher probability of being in the top 5 % (0.014 technical system with infrastructural properties (Vannuccini and Pryt­
mean value). kova, 2021).
As for the controls, the number of authors is positively related to
impact (Lee et al., 2015) and reduces impact variation; international
collaborations increase citation expectations (Glänzel and Schubert, 5.3. Robustness analysis
2001); publishing in a high impact factor journal further increases the
average number of citations; surveys and other papers with many ref­ Our results are robust across a wide range of additional tests.
erences tend to attract more citations; and, finally, a negative effect is Tables and further material can be found in Appendix A and Supple­
found between private participation and scientific impact, albeit not mentary material.
particularly significant. First, we excluded all articles that fall into the WoS ‘Neurosciences’
In sum, the econometric analysis shows that research using NN has a category. This domain can be potentially problematic in that some terms
high potential for greater impact, on the one hand, but that it is also (neural network, first and foremost) may not necessarily refer to artifi­
associated with greater uncertainty of having an impact, on the other. cial intelligence but rather to human intelligence and the biological
There are several (complementary) explanations for this uncertainty: the brain. The sample falls by about 30 % and the number of NN articles
‘high-risk/high-gain’ that characterizes the adoption of emerging tech­ almost halves. However, our results are consistent when replicating the
nologies and breakthrough research (Rotolo et al., 2015; Wang et al., analysis on the sub-sample.
2017); the challenge of integrating the scientific instrument into existing Second, we excluded all articles that contain the terms ‘neural
scientific practices (Rosenberg, 1992); the ability to extract the true network’ and ‘neural networks’ exclusively in their title, keywords, or
potential from the instrument and not to adopt it simply because abstract. Bear in mind that an article may still contain a term such as
‘everybody does’; and the possible social resistance, especially in sen­ ‘artificial neural network’ or ‘convolutional neural network’ which
sitive domains, as some areas of the health sciences are known to be. should now refer to artificial intelligence stricto sensu. In this case,
Based on these results, we propose that AI – here, specifically, NN neuroscience papers may form part of the sample. This restriction is
methods – be regarded as an emerging general method of invention: severe insofar as the number of NN articles falls by more than 70 %. Yet
‘emerging’ because it shares the key attributes of emerging technologies; our results are robust to this constraint.
‘general’ because it is increasingly integrated as a research tool in many The third exercise consists of a different econometric approach.
scientific domains; and, a ‘method of invention’ because it has great Instead of regression analysis, we compared each NN paper with a ‘twin’
potential for impact in application domains. We consider it more non-NN paper. More precisely, the empirical strategy considers the
appropriate to consider AI an emerging general method of invention as adoption of NN as a ‘treatment’; hence, we employ exact matching and
opposed to a general-purpose method of invention (as in Cockburn et al., 1:1 nearest neighbour matching on propensity scores (PSM) to select an
2018) for two reasons. First, as we have seen in Section 4, although appropriate control group of untreated papers. Exact matching is per­
growing, the proportion of scientific contributions related to NNs re­ formed considering Web of Science categories, publication year, and
mains marginal compared to the whole body of scientific activity. journal – that is, we compare a NN article in terms of novelty and impact
with an article belonging to the same domain(s), published in the same

10
S. Bianchini et al. Research Policy 51 (2022) 104604

Table 4
Impact profile of NN publications.
NegBin: # Citations Probit: Top 5 % Cited Probit: Top 10 % Cited
(1) (2) (3)

Panel A: Mean NN 0.101** 0.147*** 0.155***


(0.040) (0.041) (0.043)
Novelty (All Sciences) 0.153*** 0.200*** 0.191***
(0.023) (0.016) (0.015)
# References (log) 0.491*** 0.429*** 0.477***
(0.064) (0.075) (0.062)
# Authors (log) 0.237*** 0.166*** 0.194***
(0.026) (0.039) (0.036)
International Collab. 0.064*** 0.083*** 0.085***
(0.013) (0.014) (0.013)
Private Partic. − 0.029* − 0.027 − 0.034**
(0.015) (0.018) (0.015)
JIF 0.205*** 0.167*** 0.179***
(0.022) (0.017) (0.018)
Journal Age (log) 0.050 − 0.066 − 0.048
(0.036) (0.086) (0.079)
Survey 0.541*** 0.667*** 0.627***
(0.060) (0.054) (0.049)
Panel B: Dispersion NN 0.136***
(0.051)
Novelty (All Sciences) 0.093***
(0.017)
# References (log) − 0.496***
(0.038)
# Authors (log) − 0.213***
(0.044)
JIF 0.040
(0.031)
Journal Age (log) − 0.118***
(0.029)
Log likelihood − 1,519,720 − 69,222 − 110,788
χ2 [null model] 318,463*** 19,317*** 31,564***
χ2 [w/o NN model] 8.70*** 24.80*** 40.00***
# obs 356,037 356,037 356,037

Notes: This table reports coefficients of the effect of NN methods (NN, dummy) on scientific impact proxied by the number of citations received (Column 1) and ‘big
hits’ (Columns 2 and 3). Bootstrapped (500 replications) standard errors clustered at the journal-level in parentheses: ***, ** and * indicate significance at the 1 %, 5 %
and 10 % level, respectively. The effect of NN on the citation count is estimated using a negative binomial regression. Estimates for the expectation and variance are
reported in Panels A and B, respectively. Effects on the binary indicators are estimated using a Probit. Constant term, scientific field (WoS subject category) and time
fixed effects are incorporated in all model specifications. Likelihood-ratio tests are used to compare the goodness-of-fit of two statistical models: (i) null model against
complete model; (ii) model without the NN variable against the complete model.

year and in the same journal. We obtain the propensity scores associated Mitchell, 2017), and inequality and discrimination (O'Neil, 2016; Zub­
with the binary treatment via the estimation of the Probit model con­ off, 2019). Our contribution, here, provides insights into the diffusion
taining the original set of variables. The average treatment effects (ATT) and impact of AI methods in the scientific system.
for the selected variables lend further support to our results. In this paper, we first examined the diffusion of NN research in the
A final test concerns the way novelty is measured. Indeed, some sciences in an effort to verify whether NNs conform to certain charac­
research shows that different novelty indicators are often inconsistent teristics of emerging technologies. We found that NN research activity
with each other and may return different sets of novel contributions has grown exponentially in almost all sciences and all over the world,
(Fontana et al., 2020). Thus, we implemented the indicator developed in and the diffusion process has followed a double-boom cycle with a
Uzzi et al. (2013) to define an ‘atypical’ (novelty/conventionality) strong re-configuration of global actors. The diffusion of NN methods
quadrant: high-conventionality/high-novelty (HC–HN); high- into application domains began in a cross-disciplinary fashion involving
conventionality/low-novelty (HC–LN); low-conventionality/high- the computer sciences, breaking their way into ‘pure’ field-specific
novelty (LC–HN); and low-conventionality/low-novelty (LC–LN). The research within the various application domains. We then examined
four categories are employed in a multinomial logistic regression. We the impact of technology adoption on scientific discovery, with a
find that, within the knowledge landscape of the health sciences, NN particular focus on the health sciences. We found the adoption of NN
articles are more likely to draw on highly conventional combinations of methods to be negatively correlated with recombinatorial novelty;
knowledge. Ceteris paribus, our estimates suggest that when NN however, a positive correlation was found with the expectation and
methods inject some highly (field-specific) unusual combinations, they dispersion of citations received, increasing a contribution's likelihood of
do so primarily in an exceptionally conventional knowledge space. becoming a ‘big hit’.
Conceptually, we considered scientific discovery to be a recombi­
6. Concluding remarks natorial process in which existing knowledge is recombined to create
new knowledge, a process that continues perpetually in a dynamic
Most socio-economic analyses of AI have looked at the effects of knowledge landscape. A traditional image of science is one in which the
technology on economic growth (Brynjolfsson and McAfee, 2014; knowledge landscape is made up of islands – i.e., (sub)-disciplines or
Aghion et al., 2017), labour market and productivity dynamics (Furman scientific fields – where most of this recombination takes place. The
and Seamans, 2019; Acemoglu and Restrepo, 2020; Van Roy et al., islands reflect the structure of nature but also the need for a scientific
2020), changes in skills (Graetz and Michaels, 2018; Brynjolfsson and mind to organize the complexity of the world. Seen this way, scientists

11
S. Bianchini et al. Research Policy 51 (2022) 104604

are sailors whose goal is to navigate from island to island, figure out cross-fertilization between communities. This could be achieved, for
their structure, and explore the surrounding landscape. Sailors can opt instance, by reinforcing both horizontal (intra-field) and vertical (inter-
to stay in the ‘comfort zone’ and further their knowledge of one (or field) knowledge management. Digital platforms and knowledge hubs
neighbouring) island(s), or they can sail to more distant islands and could be complemented by physical ‘collaborative spaces’ where the
connect new areas of the landscape. Both actions enrich the knowledge tacit knowledge of different communities might be transferred face-to-
space, one exploring well-formalized knowledge structures, the other face, documented and made accessible for later use. Another standard
reshaping and rearranging the landscape. Our findings suggest that, at instrument is obviously research funding, which should not target in­
least as it is used today, AI – the boat or the compass, to stick with the dividual areas but rather research ‘priorities’ (e.g., fighting a given
analogy – seems to be more in line with the first action. However, the disease) involving different communities that can frame their research
possibilities of discovering new and valuable things about the known questions together.
islands are far from obvious, as confirmed by our results on scientific However, promoting collaboration between communities can pose
impact. certain challenges in terms of governance and data ownership. Data is a
A general-purpose invention in the method of invention? Or a polymorphous category, which means standards, principles and rules
passing fad in science? We think not. Our findings lead us to take up a governing the various types of data are not homogeneous across com­
more moderate stance in the recent debate on how AI affects the munities, let alone across countries. This opens up the question of how
development of knowledge. NN methods do not (yet) serve as an auto­ data should be generated/used in compliance with different regulations,
pilot for navigating the sea of knowledge and connecting ideas, but they and also how the value of data should be distributed (Savona, 2019).
are, nevertheless, an extremely powerful and versatile research tool that The diffusion of AI, as a research instrument, can be self-sustaining
impacts knowledge creation in measurable ways. Thus, we propose that only if there is social acceptance – i.e., if the crew trusts the captain
AI should be considered an emerging general method of invention. But do and the equipment. Several AI applications represent innovations that
not be fooled, we are not simply seeking to win the race to coin the most can bring about far-reaching changes in all aspects of our daily lives.
attractive designation; rather, as we discuss below, thinking of this These social innovations can have unintended yet negative conse­
technology as ‘general’ and ‘emerging’ has policy implications that quences in terms of security, privacy and social equity (O'Neil, 2016).
differ substantially from those that might result from thinking of it as a The public will no longer tolerate being excluded from the debate and it
general-purpose technology (for more on the latter, see, e.g., Trajten­ is here that the scientific and policy community have a key role to play.
berg, 2018; Klinger et al., 2021). Both parties can improve the channelling of scientific evidence into the
First, the diffusion of intelligent machines as input in the research public arena and fight the risks posed by fake news. Policy can promote
production process calls into question the organization and management communication by setting the right, often intrinsic, incentives to
of science. AI may trigger a short-term substitution towards capital and encourage as many scientists as possible to engage with different seg­
away from highly skilled labour in the knowledge production process. ments of the public. However, communicating science to non-scientific
Whether such a substitution effect is occurring is doubtful and clearly audiences can be difficult since it requires a different approach from
requires further empirical investigation. In parallel, the arrival of that of communicating science to scientific audiences. This means sci­
automation technologies in science puts a wide range of research tasks entists need to be able to detach the layers of scientific complexity that
under threat, either by reducing the cost of performing those tasks or by characterize their research so as to deliver a clear message to the public,
outperforming human scientists in the performance of them. Some tasks a message, moreover, that should include both potential impacts and
within the occupation may be suitable for automation while others may ethical issues. ‘Listening mechanisms’ can also be used to inform citi­
not, and the overall effects on employment in science are very complex. zens' knowledge, expectations, and imaginaries about intelligent ma­
Therefore, research-oriented organizations need a better understanding chines and, why not, about their role in science. There are a variety of
of the set of tasks performed by their scientists, the coordination of these means available for achieving these goals, ranging from in-depth in­
tasks, and the respective strengths and weaknesses of humans (H) and terviews and material deliberations to citizen science. We believe that
machines (M), before they can hope to unleash the benefits of H + M citizen science has the potential to bring the greatest benefits to both the
cooperation. public and the scientific system. The nonprofessional involvement of
Machines are set to become more than tools; they have the potential volunteers in the scientific process, whether in more mundane tasks such
to become another teammate. As such, H–M interactions will require the as data collection or in other phases of the research, offers great op­
coordination of complex activities, including communication, joint ac­ portunities for the public to become familiar with the technology but
tions and human-aware execution. As these machine teammates will also provides researchers with great opportunities to improve their re­
operate in different collaborative environments, they need to be sults (Bonney et al., 2014; Sullivan et al., 2018). However, fully
designed with different collaborative capabilities. This design area will accountable institutional mechanisms are a precondition for guaran­
require considering such aspects as appearance (what machines should teeing trust between scientists and the public and for ensuring conti­
look like); learning and knowledge processing (how they should learn); nuity in their relationship. For instance, all results and the process used
conversation (how they should interact and socialize with their peers); in reaching these results should be open to scrutiny. Policy should
architecture (what their main components should be); reliability, re­ promote feedback activities so as to maintain citizen involvement and
sponsibility and liability. (For a more in-depth discussion on design explain how their inputs were used in meeting research aims; reconcile
areas for human-machine collaboration, see Seeber et al., 2020). conflicting values and objectives; and, put in place collective intelli­
It seems that NN methods are being adopted in different scientific gence mechanisms that can help them develop a systemic understanding
fields but that existing knowledge structures are remaining relatively of the future implications of technological progress and make better
stable. This suggests the full potential of the technology (and its future consensus decision-making – all very much in line with the notion of
development) might be better achieved by further spanning the ‘Decisions 2.0’ (Bonabeau, 2009). Finally, we fully embrace the concept
boundaries between scientific areas. The bringing together of expertise of ‘boundary organizations’ specifically designed to deal with socio-
and knowledge from various domains could help in the identification of economic transformations in the digital age. These organizations
blind spots and opportunities in the knowledge landscape. The concepts would sit at the intersection of scientific and political spheres and allow
of ‘knowledge communities’ and ‘communities of practice’ seem scientists and policy-makers to maintain a constant dialogue with each
particularly apt in this context. Although communities often self- other.
organize and self-sustain themselves, they can also benefit from policy Although the AI revolution has been the subject under scrutiny here,
endorsement. It seems crucial to us that institutions and a policy envi­ ironically this revolution offers the tools with the greatest potential for
ronment be developed that are conducive to enhancing dialogue and bringing about a radical transformation in the interactions between the

12
S. Bianchini et al. Research Policy 51 (2022) 104604

public, the scientific community and the policy environment. These Conference on Corporate R&D and Innovation, Seville (Spain); EMAEE
interactions, if exploited carefully, should serve to give a boost to human 2019 ‘Economics, Governance and Management of AI, Robots and Dig­
efforts to better understand the greatest mystery of all: the origin and ital Transformations’, Brighton (UK); OLKC 2019 ‘The Human Side of
function of the world and our place in it, that is, the tasks of science Innovation – Understanding the Role of Interpersonal Relations in an
itself. Increasingly Digitised Workplace’; CAGE-NESTA Workshop on ‘Data
Science for the Economics of Science, Technology and Innovation’,
Declaration of competing interest London (UK); and the Workshop Series on ‘The Economics and Man­
agement of AI Technologies’, Copenhagen (Denmark) and Strasbourg
No conflict of interest to declare. (France). The authors thank seminar participants and in particular
Tommaso Ciarli, Giacomo Damioli, Mirko Draca, Daniel S. Hain, Björn
Acknowledgement Jindra, Bertrand Koebel, Roman Jurowetzki, Patrick Llerena, Juan
Mateos-Garcia, Maria Savona, Simone Vannuccini, Daniel Vertesy, and
Earlier versions of this paper were presented at the INNOVA MEA­ Marco Vivarelli. We also are very grateful to two anonymous referees for
SURE III Expert Workshop ‘Brainstorming in Ispra’; 7th European their constructive criticism.

Appendix A
Table 5
Descriptive statistics of the variables.

NN papers Non-NN papers Total

Re-combinatorial novelty
Novelty Dummy (All Sciences) 36.43 30.32 30.40
Novelty Dummy (No CS) 32.39 29.55 29.59
Novelty Dummy (Only HS) 20.96 23.52 23.49
Novelty (All Sciences) 0/0.81 (2.39) 0/0.74 (3.10) 0/0.74 (3.09)
Novelty (No CS) 0/0.65 (2.12) 0/0.71 (3.07) 0/0.71 (3.06)
Novelty (Only HS) 0/0.37 (1.62) 0/0.50 (2.40) 0/0.5 (2.39)
Scientific impact
Top 5 % Cited 8.33 5.77 5.80
Top 10 % Cited 15.68 11.38 11.43
# Citations (raw count) 17/38.34 (114.43) 18/35.48 (82.67) 18/35.51 (83.15)
Citations (yearly normalized) 2.06/4.06 (8.16) 2.08/3.75 (8.02) 2.08/3.75 (8.02)
Controls
# References 40/45.92 (29.59) 33/37.12 (25.66) 33/37.23 (25.73)
# Authors 4/4.07 (2.37) 4/4.90 (3.50) 4/4.89 (3.49)
International Collab. 26.21 23.02 23.06
Private Partic. 6.80 7.09 7.09
JIF 1.39/2.12 (2.06) 1.73/2.42 (2.13) 1.73/2.41 (2.13)
Journal Age 22/28.57 (26.07) 33/38.47 (29.08) 32/38.35 (29.06)
Survey 0.72 0.78 0.77
Time period [2001–2015] [2001–2015] [2001–2015]
# scientific fields 46 48 48
# journals 92 92 92
# papers 4560 (1.28 %) 351,477 (98.72 %) 356,037 (100 %)
Notes: Binary indicators in [%], for continuous measures [median/mean (s.d.)]. The statistics refer to the period used for the econometric
analysis.

Table 6
Novelty and impact profile – Matching.

Exact matching Propensity score matching

(1) (2) (3) (4)

Novelty (All Sciences) 0.054*** 0.053*** 0.035*** 0.023*


Novelty (No CS) 0.026** 0.026** 0.008 − 0.001
Novelty (Only HS) − 0.005 − 0.005 − 0.025** − 0.033***
# Citations 0.192*** 0.195*** 0.102*** 0.063**
Notes: This table reports Average Treatment Effect on the Treated (ATT) for novelty and impact variables. The set of variables used for
each matching is composed as follows: (1) Journal/WoS categories/publication year; (2) all dummy variables in our set of control var­
iables/journal/WoS categories/publication year; (3) number of authors (log)/number of references (log)/journal/WoS categories/pub­
lication year; (4) all variables. ***, ** and * indicate significance at the 1 %, 5 % and 10 % levels, respectively.

13
S. Bianchini et al. Research Policy 51 (2022) 104604

Table 7
Atypical profile of NN publications.

Category All Sciences No CS Only HS

(1) (2) (3)

HC–HN 0.008 0.208 0.308**


(0.130) (0.133) (0.136)
HC–LN − 0.041 0.090 − 0.049
(0.157) (0.152) (0.154)
LC–LN − 0.043 − 0.086 0.021
(0.162) (0.163) (0.155)
Other variables Yes Yes Yes
Log likelihood − 374,002 − 374,000 − 363,855
χ2 [null model] 95,913*** 95,488*** 115,891***
χ2 [w/o NN model] 259*** 158.20*** 144***
# obs 320,587 320,587 320,587
Notes: This table reports coefficients of the effect of NN methods (NN, dummy) on atypical profiles. Category
LC-HN is the reference category for all models. Bootstrapped (500 replications) standard errors clustered at the
journal-level in parentheses: ***, ** and * indicate significance at the 1 %, 5 % and 10 % levels, respectively.
All variables are incorporated in all model specifications, details in Supplementary material. Likelihood-ratio
tests are used to compare the goodness-of-fit of two statistical models: (i) null model against complete model;
(ii) model without the NN variable against the complete model.

Appendix B. Supplementary data

Supplementary data to this article can be found online at [Link]

References Furman, J.L., Teodoridis, F., 2020. Automation, research technology, and researchers’
trajectories: evidence from computer science and electrical engineering. Organ. Sci.
31 (2), 330–354.
Acemoglu, D., Restrepo, P., 2020. Robots and jobs: Evidence from US labor markets.
Glänzel, W., Schubert, A., 2001. Double effort = double impact? A critical view at
J. Polit. Econ. 128 (6), 2188–2244.
international co-authorship in chemistry. Scientometrics 50 (2), 199–214.
Adner, R., Levinthal, D.A., 2002. The emergence of emerging technologies. Calif. Manag.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT press.
Rev. 45 (1), 50–66.
Gordon, R.J., 2016. The Rise and Fall of American Growth. Princeton University Press.
Aghion, P., Jones, B.F., Jones, C.I., 2017. Artificial Intelligence and Economic Growth
Graetz, G., Michaels, G., 2018. Robots at work. Rev. Econ. Stat. 100 (5), 753–768.
(No. w23928). National Bureau of Economic Research.
Griliches, Z., 1957. Hybrid corn: an exploration in the economics of technological
Agrawal, A., McHale, J., Oettl, A., 2018. Finding Needles in Haystacks: Artificial
change. Econometrica 501–522.
Intelligence and Recombinant Growth (No. w24541). National Bureau of Economic
Hassabis, D., Kumaran, D., Summerfield, C., Botvinick, M., 2017. Neuroscience-inspired
Research.
artificial intelligence. Neuron 95 (2), 245–258.
Angermueller, C., Pa¨Pärnamaa, T., Parts, L., Stegle, O., 2016. Deep learning for
Hey, A.J., Tansley, S., Tolle, K.M., 2009. The Fourth Paradigm: Data-intensive Scientific
computational biology. Mol. Syst. Biol. 12 (7).
Discovery, Vol. 1. Microsoft research, Redmond, WA.
Arthur, W.B., 2009. The Nature of Technology: What it is and How it Evolves. Simon and
Jones, B.F., 2009. The burden of knowledge and the “death of the renaissance man”: is
Schuster.
innovation getting harder? Rev. Econ. Stud. 76 (1), 283–317.
Azoulay, P., Graff Zivin, J.S., Manso, G., 2011. Incentives and creativity: evidence from
King, R.D., Rowland, J., Oliver, S.G., Young, M., Aubrey, W., Byrne, E., Sparkes, A., 2009.
the academic life sciences. RAND J. Econ. 42 (3), 527–554.
The automation of science. Science 324 (5923), 85–89.
Baradaran, R., Ghiasi, R., Amirkhani, H., 2020. A Survey on Machine Reading
Klinger, J., Mateos-Garcia, J., Stathoulopoulos, K., 2021. Deep learning, deep change?
Comprehension Systems. arXiv preprint arXiv:2001.01582.
Mapping the evolution and geography of a general purpose technology.
Berndt, D.J., Clifford, J., 1994. Using dynamic time warping to find patterns in time
Scientometrics 1–33.
series. In: KDD Workshop, Vol. 10, pp. 359–370. No. 16.
Kuhn, T., 1962. The Structure of Scientific Revolution. University of Chicago, Chicago,
Bloom, N., Jones, C.I., Van Reenen, J., Webb, M., 2020. Are ideas getting harder to find?
USA.
Am. Econ. Rev. 110 (4), 1104–1144.
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J., 2017. Building machines
Boden, M.A., 2004. The Creative Mind: Myths and Mechanisms. Routledge.
that learn and think like people. Behav. Brain Sci. 40.
Boden, M.A., 2009. Computer models of creativity. AI Mag. 30 (3), 23-23.
Lanier, J., 2010. You are not a Gadget: A Manifesto. Vintage.
Bonabeau, E., 2009. Decisions 2.0: the power of collective intelligence. MIT Sloan
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444.
Manag. Rev. 50 (2), 45.
Lee, Y.N., Walsh, J.P., Wang, J., 2015. Creativity in scientific teams: unpacking novelty
Bonney, R., Shirk, J.L., Phillips, T.B., Wiggins, A., Ballard, H.L., Miller-Rushing, A.J.,
and impact. Res. Policy 44 (3), 684–697.
Parrish, J.K., 2014. Next steps for citizen science. Science 343 (6178), 1436–1437.
Leung, M.K., Delong, A., Alipanahi, B., Frey, B.J., 2015. Machine learning in genomic
Bourdieu, P., 1975. The specificity of the scientific field and the social conditions of the
medicine: a review of computational problems and data sets. Proc. IEEE 104 (1),
progress of reason. Soc. Sci. Inf. 14 (6), 19–47.
176–197.
Bresnahan, T.F., Trajtenberg, M., 1995. General purpose technologies ‘Engines of
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Sánchez, C.
growth’? J. Econ. 65 (1), 83–108.
I., 2017. A survey on deep learning in medical image analysis. Med. Image Anal. 42,
Brynjolfsson, E., McAfee, A., 2014. The Second Machine Age: Work, Progress, and
60–88.
Prosperity in a Time of Brilliant Technologies. WW Norton & Company.
Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E., Svetnik, V., 2015. Deep neural nets as a
Brynjolfsson, E., Mitchell, T., 2017. What can machine learning do? Workforce
method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55 (2),
implications. Science 358 (6370), 1530–1534.
263–274.
Cockburn, I.M., Henderson, R., Stern, S., 2018. The Impact of Artificial Intelligence on
Marblestone, A.H., Wayne, G., Kording, K.P., 2016. Toward an integration of deep
Innovation (No. w24449). National Bureau of Economic Research.
learning and neuroscience. Front. Comput. Neurosci. 10, 94.
Daugherty, P.R., Wilson, H.J., 2018. Human + Machine: Reimagining Work in the Age of
Marx, V., 2013. Biology: the big challenges of big data. Nature 498, 255–260.
AI. Harvard Business Press.
Merton, R.K., 1957. Priorities in scientific discovery: a chapter in the sociology of
Fleming, L., 2001. Recombinant uncertainty in technological search. Manag. Sci. 47 (1),
science. Am. Sociol. Rev. 22 (6), 635–659.
117–132.
Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013a. Efficient Estimation of Word
Fleming, L., Mingo, S., Chen, D., 2007. Collaborative brokerage, generative creativity,
Representations in Vector Space. arXiv preprint arXiv:1301.3781.
and creative success. Adm. Sci. Q. 52 (3), 443–475.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013b. Distributed
Fontana, M., Iori, M., Montobbio, F., Sinatra, R., 2020. New and atypical combinations:
representations of words and phrases and their compositionality. In: Advances in
an assessment of novelty and interdisciplinarity. Res. Policy 49 (7), 104063.
Neural Information Processing Systems, pp. 3111–3119.
Foster, J.G., Rzhetsky, A., Evans, J.A., 2015. Tradition and innovation in scientists’
Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T., 2018. Deep learning for
research strategies. Am. Sociol. Rev. 80 (5), 875–908.
healthcare: review, opportunities and challenges. Brief. Bioinform. 19 (6),
Furman, J., Seamans, R., 2019. AI and the economy. Innov. Policy Econ. 19 (1), 161–191.
1236–1246.

14
S. Bianchini et al. Research Policy 51 (2022) 104604

Nature Editorial, 2019. The scientific events that shaped the decade. Nature 576 (2019), Seeber, I., Bittner, E., Briggs, R.O., De Vreede, T., De Vreede, G.J., Elkins, A., Sollner, M.,
337–338. 2020. Machines as teammates: a research agenda on AI in team collaboration. Inf.
Nilsson, N.J., 2009. The Quest for Artificial Intelligence. Cambridge University Press. Manag. 57 (2), 103174.
O’Neil, C., 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Shen, D., Wu, G., Suk, H.I., 2017. Deep learning in medical image analysis. Annu. Rev.
Threatens Democracy. Broadway Books. Biomed. Eng. 19, 221–248.
Olier, I., Orhobor, O.I., Dash, T., Davis, A.M., Soldatova, L.N., Vanschoren, J., King, R.D., de Solla Price, D.J., 1963. Little Science, Big Science. Columbia University Press, New
2021. Transformational machine learning: learning how to learn from many related York.
scientific problems. Proc. Natl. Acad. Sci. 118 (49). Sullivan, D.P., Winsnes, C.F., Akesson, L., Hjelmare, M., Wiking, M., Schutten, R.,
Raghu, M., Schmidt, E., 2020. A Survey of Deep Learning for Scientific Discovery. arXiv Smith, K., 2018. Deep learning is combined with massive-scale citizen science to
preprint arXiv:2003.11755. improve large-scale image classification. Nat. Biotechnol. 36 (9), 820–828.
Raghupathi, W., Raghupathi, V., 2014. Big data analytics in healthcare: promise and Sutton, C., Gong, L., 2017. Popularity of [Link] Within Computer Science. arXiv
potential. Health Inf. Sci. Syst. 2 (1), 3. preprint arXiv:1710.05225.
Rajkomar, A., Oren, E., Chen, K., Dai, A.M., Hajaj, N., Hardt, M., Sundberg, P., 2018. Trajtenberg, M., 2018. AI as the Next GPT: A Political-Economy Perspective (No.
Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. w24245). National Bureau of Economic Research.
1 (1), 18. Uzzi, B., Mukherjee, S., Stringer, M., Jones, B., 2013. Atypical combinations and
Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z., scientific impact. Science 342 (6157), 468–472.
2017. Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21 (1), Van Roy, V., Vertesy, D., Damioli, G., 2020. AI and robotics innovation. In: Handbook of
4–21. Labor, Human Resources and Population Economics, pp. 1–35.
Rosenberg, N., 1992. Scientific instrumentation and university research. Res. Policy 21 Vannuccini, S., Prytkova, E., 2021. Artificial Intelligence’s New Clothes? From General
(4), 381–390. Purpose Technology to Large Technical System, SWPS 2021-02. No. 16.
Rotolo, D., Hicks, D., Martin, B.R., 2015. What is an emerging technology? Res. Policy 44 Wadden, D., Lin, S., Lo, K., Wang, L.L., van Zuylen, M., Cohan, A., Hajishirzi, H., 2020.
(10), 18271843. Fact or Fiction: Verifying Scientific Claims. arXiv preprint arXiv:2004.14974.
Russell, S., Norvig, P., 2021. In: Artificial Intelligence: A Modern Approach, Global Wagner, C.S., Whetsell, T.A., Mukherjee, S., 2019. International research collaboration:
Edition 4th. Foundations, 19, p. 23. novelty, conventionality, and atypicality in knowledge recombination. Res. Policy
Savadjiev, P., Chong, J., Dohan, A., Vakalopoulou, M., Reinhold, C., Paragios, N., 48 (5), 1260–1270.
Gallix, B., 2019. Demystification of AI-driven medical image interpretation: past, Wang, J., Veugelers, R., Stephan, P., 2017. Bias against novelty in science: a cautionary
present and future. Eur. Radiol. 29 (3), 1616–1624. tale for users of bibliometric indicators. Res. Policy 46 (8), 1416–1436.
Savona, M., 2019. The value of data: Towards a framework to redistribute it. In: SPRU Weitzman, M., 1998. Recombinant growth. Q. J. Econ. 113 (2), 331–360.
Working Paper Series 2019–21. Zhang, Q., Yang, L.T., Chen, Z., Li, P., 2018. A survey on deep learning for big data.
Schmidhuber, J., 2015. Deep learning in neural networks: an overview. Neural Netw. 61, Information Fusion 42, 146–157.
85–117. Zhang, S., Yao, L., Sun, A., Tay, Y., 2019. Deep learning based recommender system: a
Schmoch, U., 2007. Double-boom cycles and the comeback of science-push and market- survey and new perspectives. ACM Comput. Surv. 52 (1), 1–38.
pull. Res. Policy 36 (7), 1000–1015. Zuboff, S., 2019. The age of surveillance capitalism: the fight for a human future at the
new frontier of power. In: Profile Books.

15

You might also like