The Challenge and Opportunity of Big Data: A Comparison of Geoscience and Medical Science
It is generally agreed that physical, and particularly medical, science are ahead of many of the ‘descriptive’ sciences like geology in the way that they deal with data. Geological data, especially deep time data, are often very siloed and poorly integrated, holding back the science (e.g. Sinha et al. 2013, Stephenson 2018).
In the 1950s to 1990s, medical and health scientists began to integrate, manage and use enormous amounts of medical data to solve medical problems, and now in the 2020s are beginning to use advanced artificial intelligence and machine learning to diagnose and cure disease. These advances in medical informatics are an example to specialists in geological informatics showing what we can achieve if we integrate data better. The Deep-time Digital Earth programme (DDE) of the international Union of Geological Sciences will be an important part of this transformation of geological data.
The earliest use of computers for medicine was in dentistry in the 1950s in the United States. A widely read article published in 1959, Lusted & Lusted (1959), showed how computers could be used in medical decision making. One of the earliest studies in medical informatics was published in 1964. Murray et al. (1964) showed how large amounts of data and computing could be used to quantify normal human movement, to help in the design of artificial limbs. In 1960s and 70s, computers began to be used to analyse large sample sizes by bringing together many databases which had been previously separate, allowing big gains in medical understanding and better medical treatment.
Fig. 1 Murray et al. (1964) showed how data and computing could be used to quantify normal human movement, to help in the design of artificial limbs
By the 1990s, computer programs for the interpretation of electrocardiograms performed almost as well as cardiologists in identifying major cardiac disorders (e.g. Willems et al. 1991). More recent work has shown artificial intelligence methods to be vital in diagnosis (Kumar et al. 2022).
Earth science is not without its advances in big data and data driven geoscience (see summary in Wang et al. 2021), but huge amounts of data still require to be made accessible and interoperable (e.g. Stephenson et al. 2019). Much geoscience data are typically found in what is referred to as the “long tail” (e.g. Sinha et al. 2013, Fig. 2).
Figure 2. Two types of geoscience data
Making these unstructured and inherently heterogeneous long tail data findable, accessible, interoperable and reusable (FAIR), is surely one of our highest priorities. DDE’s vision is to transform Earth science by connecting and harmonising long tail deep-time data ‘islands’ to support broad-based scientific studies relevant to the entire Earth system. The philosophical approach of DDE will be to provide the ‘wiring’ that will connect disparate and distributed deep-time database ‘islands’ (Fig. 3).
Figure 3. New protocols, platforms and programs are needed to secure compatible and interoperable databases, so that the vast amounts of existing (and new) deep-time geoscience data can be linked. DDE will provide the wiring to connect deep-time data sources together
In conclusion, the progress of health and medical science in dealing with medical long tail data shows the benefits of integration and management. DDE is an emerging movement to tackle the challenge of long tail data in the geosciences, the unstructured and inherently heterogeneous geoscience data that resides in institutions, universities and on individual geoscientists’ computers. DDE’s vision is to transform Earth science by connecting and harmonising long tail deep-time data ‘islands’ to support broad-based scientific studies relevant to the entire Earth system.
If you’re interested in DDE contact the DDE Secretariat at firstname.lastname@example.org.
Kumar, Y., Koul, A., Singla, R., & Ijaz, M. F. (2022). Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. Journal of ambient intelligence and humanized computing, 1–28. Advance online publication. https://doi.org/10.1007/s12652-021-03612-z
Ledley, R.S. and Lusted, L.B. (1959) Reasoning Foundations of Medical Diagnosis; Symbolic Logic, Probability, and Value Theory Aid Our Understanding of How Physicians Reason. Science, 130, 9-21.
Murray MP, Drought AB, Kory RC (1964). "Walking patterns of normal men". The Journal of Bone and Joint Surgery. American Volume. 46 (2): 335–60.
Sinha, A.K., Thessen, A.E., and Barnes, C.G., 2013, Geoinformatics: Toward an integrative view of Earth as a system, in Bickford, M.E., ed., The Web of Geological Sciences: Advances, Impacts, and Interactions: Geological Society of America Special Paper 500, p. 591–604, doi:10.1130/2013.2500(19)
Stephenson, M H, Cheng, Q, Wang, D, Fan, J, Oberhänsli, R. 2020. Progress towards the establishment of the IUGS Deep-time Digital Earth (DDE) programme. Episodes 43(4): 1057-1062
Stephenson, M H (2018) Energy and Climate Change: An Introduction to Geological Controls, Interventions and Mitigations Elsevier, Amsterdam, pp 208. ISBN: 9780128120224; Paperback ISBN: 9780128120217,
Wang, Chengshan, Robert M Hazen, Qiuming Cheng, Michael H Stephenson, Chenghu Zhou, Peter Fox, Shu-zhong Shen, Roland Oberhänsli, Zengqian Hou, Xiaogang Ma, Zhiqiang Feng, Junxuan Fan, Chao Ma, Xiumian Hu, Bin Luo, Juanle Wang, Craig M Schiffries 2021. The Deep-Time Digital Earth program: data-driven discovery in geosciences, National Science Review, 2021; nwab027
Willems JL, Abreu-Lima C, Arnaud P, van Bemmel JH, Brohet C, Degani R, Denis B, Gehring J, Graham I, van Herpen G, et al. The diagnostic performance of computer programs for the interpretation of electrocardiograms. N Engl J Med. 1991 Dec 19;325(25):1767-73. doi: 10.1056/NEJM199112193252503. PMID: 1834940.