Discovery science (also known as discovery-based science) is a scientific methodology which aims to find new patterns, correlations, and form hypotheses through the analysis of large-scale experimental data. The term “discovery science” encompasses various fields of study, including basic, translational, and computational science and research.[1] Discovery-based methodologies are commonly contrasted with traditional scientific practice, the latter involving hypothesis formation before experimental data is closely examined. Discovery science involves the process of inductive reasoning or using observations to make generalisations, and can be applied to a range of science-related fields, e.g., medicine, proteomics, hydrology, psychology, and psychiatry.[2]
Discovery science places an emphasis on 'basic' discovery, which can fundamentally change the status quo. For example, in the early years of water resources research, the use of discovery science was demonstrated by seeking to elucidate phenomena that was, until that point, unexplained. It did not matter how unusual these ideas may have been perceived to be. In this sense, discovery science is based on the attitude that ‘‘we must not allow our concepts of the earth, in so far as they transcend the reach of observation, to root themselves so deeply and so firmly in our minds that the process of uprooting them causes mental discomfort" (as stated by Davis in 1926).[3] For discovery science to be utilised, there is a need to revert to creating and testing genuine hypotheses, rather than focusing on praising concepts that are already familiar. While researchers commonly feel that new hypotheses will naturally emerge inductively from curiosity in the relevant field, it should be acknowledged that hypotheses can be generated by models. Additionally, deductive testing must involve field observation, so that imperfect answers can be substituted with questions that are more clearly defined.
Hypothesis-driven studies can be transformed into discovery-driven studies with the help of newly available tools and technology-driven life science research. These tools have allowed for new questions to be asked, and new paradigms to be considered, particularly in the field of biology. However, some of these required tools are limited in the sense that they are inaccessible or too costly because the related technology is still being developed.
Data mining is the most common tool used in discovery science, and is applied to data from diverse fields of study such as DNA analysis, climate modelling, nuclear reaction modelling, and others. The use of data mining in discovery science follows a general trend of increasing use of computers and computational theory in all fields of science, and newer methods of data mining employ specialised machine learning algorithms for automated hypothesis forming and automated theorem proving.
While computational methods are gaining interest, there is a decline in efforts to support critical care through basic and translational science, i.e., forms of discovery science which are essential for advancing understanding of pathophysiology. A loss of interest in basic and translational science may lead to a failure to discover and develop new therapies, which could have an impact on the critically ill. Within critical care, there is an aim to renew emphasis on basic, translational science through platforms such as medical journals and conferences, as well as the critical care medical curricula. Advances in discovery-based science thereby underlie key discoveries and development in medicine, constituting a 'pipeline' for leading-edge medical development.
According to the AACR Cancer Progress Report 2021, discovery science has the potential to drive clinical breakthroughs.[4] Since discovery science underlies key discoveries and development of new therapies for medicine, it remains important for advancing critical care. Numerous discoveries have increased life span and productivity, and decreased health-related costs, thereby revolutionising medical care. Resultantly, return on investment for discovery science has proven to be high. For example, its combination of computational methods with knowledge on inflammatory and genomic pathways has resulted in optimised clinical trials. Ultimately, discovery science is currently enabling a transition to the era of personalised medicine for treating complex syndromes, e.g., sepsis and ARDS. With a robust infrastructure, discovery science can resultantly revolutionise medical care and biological research.
Discovery science has converged with clinical medicine and cancer genomics, and this convergence has been accelerated by recent advances in genome technologies and genomic information.[5] The effect of cancer genomics has been noticeable in every area of cancer research. The majority of successful applications of genomic knowledge in today's clinical medicine involves a wealth of knowledge which has been gathered by a broad range of research and decades of work. Biological insights are required to inform drug discovery and to set a clear clinical path for development.
Historically, acquisition of such knowledge through functional and mechanistic studies has been uncoordinated, random, and inefficient. The process of moving from cancer genomic discoveries to personalised medicine involves some major scientific, logistical and regulatory hurdles. This includes patient consent, sample acquisition, clinical annotation and study design, all of which can lead to data generation and computational analyses. Additionally, functional and mechanistic studies remain a challenge, which can lead to drug and biomarker discovery and development, commercial challenges and genomics-informed clinical trials. Importantly, these key scientific challenges are interdependent with each other. Directed and streamlined approaches are sought to be developed for a rapid generation of biological discoveries, which can allow for cancer genomic discoveries to translate to the clinic. Delivering personalised cancer medicine benefits from traditional, unconstrained and non-directed academic exploration, with the goal of directing scientific inquiry to convert genomic discovery to diagnostic and therapeutic targets.
Another example of discovery science is proteomics, a technology-driven and technology limited discovery science. Technologies for proteomic analysis provide information that is useful in discovery science. Proteome analysis as a discovery science is applicable in biotechnology, e.g., it assists in 1) the discovery of biochemical pathways which can identify targets for therapies, 2) developing new processes for manufacturing biological materials, 3) monitoring manufacturing processes for the purpose of quality control, and 4) developing diagnostic tests and efficacious treatment strategies for clinical diseases. In the context of proteomics, current life-science research remains technology-limited, however, recent available tools have assisted in evolving such research from being hypothesis-driven to discovery-driven.
Field hydrology has experienced a decline in progress due to a change from discovery-based field work to the gathering of data for modal parameterisation.[6] In field hydrology, models are not any more useful than an understanding of how systems work, and discovery science allows for this understanding. Several important examples of field-based inquiry and discovery have taken place in field hydrology. These include: identifying spatial patterns of soil moisture and how they relate to topography; interrogating such data through the use of geostatistics; and discovering the importance of macropore flow and hydrological connectivity. Some discovery-based questions that have been asked in field hydrology include 1) determining which parts of the watershed are most important in determining water delivery to the channel, 2) how the presence of 'old' water can be explained by groundwater travelling into the stream, and 3) how there can be an explanation for flashy hydrographs when there is no overland flow visible. Therefore, there is a need for discovery science in field hydrology, despite any unusual hydrological hypotheses that are formed.
An example of discovery science being enhanced for human brain function can be seen in the 1000 Functional Connectomes Project (FCP). This project was launched in 2009 as a way of generating and collecting functional magnetic resonance imaging (fMRI) data from over 1,000 individuals.[7] Similarly to decoding the human genome, the mapping of human brain function presents challenges to the functional neuroimaging community.[8] For the first phase of discovery science, it is necessary to accumulate and share large-scale datasets for data mining. Traditionally, the neuroimaging community within psychology has focused on task-based and hypothesis-driven approaches, however, a powerful tool for discovery science has emerged in the form of resting-state functional MRI (R-fMRI). The potential of discovery science remains vast, e.g. 1) helping with decision-making and guiding clinical diagnoses by developing objective measures of brain functional integrity, 2) assessing the level of efficacy of treatment interventions, and 3) tracking responses to treatment. Among the scientific community, recruiting participation and achieving collaboration from the broad population is essential for successfully implementing discovery-based science in the context of human brain function.
Discovery-based methodologies are often viewed in contrast to traditional scientific practice, where hypotheses are formed before close examination of experimental data. However, from a philosophical perspective where all or most of the observable "low-hanging fruit" has already been plucked, examining the phenomenological world more closely than the senses alone (even augmented senses, e.g. via microscopes, telescopes, bifocals etc.) opens a new source of knowledge for hypothesis formation. This process is also known as inductive reasoning or the use of specific observations to make generalisations.
Discovery science is usually a complex process, and consequently does not follow a simple linear cause and effect pattern. This means that outcomes are uncertain, and it is expected to have disappointing results as a fundamental part of discovery science. In particular, this may apply to medicine for the critically ill, where disease syndromes may be complex and multi-factorial. In psychiatry, studying complex relationships between brain and behaviour requires a large-scale science. This calls for a need to conceptually switch from hypothesis-driven studies to hypothesis-generating research which is discovery-based.[9] Normally, discovery-based approaches for research are initially hypothesis-free, however, hypothesis testing can be elevated to a new level that effectively supports traditional hypothesis-driven studies.[10] Researchers hope that combining integrative analyses of data from a range of different levels can result in new classification approaches to enable personalised interventions.[11] Some biologists, such as Leroy Hood, have suggested that the model of ‘discovery science’ is a model which certain research fields are heading towards. For example, it is believed that more information about gene function can be discovered, through the evolution of data-mining tools.
Discovery-based approaches are often referred to as “big data” approaches, because of the large-scale datasets that they involve analyses of. Big data includes large-scale homogenous study designs and highly variant datasets, and can be further divided into different kinds of datasets. For example, in neuropsychiatric studies, big data can be categorised as ‘broad’ or ‘deep’ data. Broad data is complex and heterogenous, as it is collected from multiple sources (e.g., labs and institutions) and uses different kinds of standards. On the other hand, deep data is collected at multiple levels, e.g., from genes to molecules, cells, circuits, behaviours, and symptoms. Broad data allows for population level inferences to be made; deep data is required for personalised medicine. However, combining broad and deep data and storing them in large-scale databases makes it practically impossible to rely on traditional statistical approaches. Instead, the use of discovery-based big data approaches can allow for the generation of hypotheses and offer an analytical tool with high-throughput for pattern recognition and data mining. It is in this way that discovery-based approaches can provide insight into causes and mechanisms of the area of study.
Although discovery-based and data-driven big data approaches can inform understanding of mechanisms behind the topic of concern, the success of these approaches depends on integrated analyses of the various types of relevant data, and the resultant insight provided. For example, when researching psychiatric dysfunction, it is important to integrate vast and complex data such as brain imaging, genomic data and behavioural data, to uncover any brain-behaviour connections that are relevant to psychiatric dysfunction.[12] Therefore, there are challenges to integrating data and developing mining tools. Furthermore, validation of results is a big challenge for discovery-based science. Although it is possible for results to be statistically validated by independent datasets, tests of functionality affect ultimate validation. Collaborative efforts are therefore critical for success.