My Thoughts on the Future of Biostatistics in the Era of Artificial Intelligence: A Real and Fictional Personal Life History Intertwining Leo Breiman’s Ideas, Triathlon Sports, and the Importance of Change – The Philosophy of My Friend Yinchao
1. Objectives and Motivation
Although this blog is primarily dedicated to interviews with experts and significant individuals in specific areas of digital health, I could not let this year-end pass without addressing a current and fundamental topic for the future of our readers, closely related to the genesis of this space: The Future of Biostatistics in the Era of Artificial Intelligence.
In approaching this subject, essential questions arise that will guide our reflection:
- What is the role of biostatistics in this new context?
- How and in which direction should research be oriented in this emerging era?
- What roles do professionals who are not necessarily researchers play in the 21st century?
- How should the teaching of biostatistics evolve to adapt to changing times?
- What is the function of biostatistics in current biomedical research?
- Is biostatistics an instrumental science for verifying knowledge, or does it become the leading science in the new era of artificial intelligence applied to medicine?
- What biostatistical profiles will be necessary in the near future?
Over fifteen years ago, when I barely understood the notion of a derivative in high school and began programming on my own in Visual .NET, I intuitively felt that the future would be data-driven across multiple areas, such as sports and traumatology. These intuitions were based on personal experiences that I will not detail here, such as a severe sports injury. I recall that during the university entrance exam, one of my high school mathematics teachers told me with a smile, "Marcos, your ideas are very advanced, and sport will never be a mathematical science." At that time, my thoughts were considered dreamy, and in our youth books, neither those of Jules Verne nor, much less, those of Harry Potter could foresee a future where mathematics and the clinical field would intertwine so deeply.
Today, I wish to begin with a recounting of my own experiences. More specifically, I will share my vision and professional history thus far, mentioning events that have impacted me and the lessons I have learned from great masters. Most importantly, I will demonstrate how these learnings have influenced my life and how they might help answer many of the questions we all ponder, or at least provide a philosophy for facing change. This philosophy I share with my friend Yinchao, who will be introduced at the end of this narrative. Undoubtedly, we are living through the greatest revolution in history with the unstoppable advancement of artificial intelligence, where the uncertainty of the future increasingly generates anxiety, and what seems like a solid reality today may be another dawn tomorrow.
We all know that a single case (N=1) is not representative in statistics, but with repeated measures, we can transfer expert knowledge through priors, apply the empirical Bayes approach, and improve the efficiency of estimators. So, without further ado, let us begin with today’s story.
2. My Origin, Starting Point, and the Influence of Triathlon as a Way of Life and Scientific Thinking
I was born in the charming city of Santiago de Compostela, considered by me as the most beautiful in the world, and also by the thousands of pilgrims who arrive to walk the legendary Camino de Santiago. I was raised in a small village in Galicia called Ordes, known so far for a world triathlon champion, Iván Rana (Champion of Cancún 2002), and a vindictive musical group. Despite my parents’ numerous attempts to introduce me to music, I never took to it. However, sport always held a prominent place in my life until a few years ago. I used to train up to 100 km a week. Like any child passionate about sports, my dream was to become an Olympic athlete. Additionally, in my native Galicia, we can boast a second world triathlon champion, Gómez Noia, who achieved the best sporting finish in history at the World Triathlon Championship in London 2012, and who will serve as a concluding element of this story.
For those who do not know, triathlon is an Olympic sport that combines, in this sequence, three disciplines: swimming (1500 m), cycling (40 km), and running (10 km). When Iván Rana began in this sport, in our village and in many places in Spain, he was considered a madman: who could complete three events simultaneously? But thanks to Rana and Gómez, triathlon has become one of the most popular sports in Spain, both recreationally and professionally, and Spain has had no better international ambassadors perhaps except Carolina Marín, David Cal, and Rafael Nadal. I mention this because, in the past, the importance of interdisciplinary research was similar to that of triathlon in individual sports like cycling and athletics, as well as in major sports like football. In the end, the metaphor is clear: interdisciplinary science is like competing in a triathlon.
During my university studies, I focused on theoretical mathematics and completed a master’s in statistics at the University of Santiago, without ever having a clear direction about what I wanted to do in the future. Before a sports injury, my dream was to become a physical education teacher or physiotherapist. Therefore, I always had a special interest in applying advanced mathematical methods to transform sports and medicine. Although at that time and in that environment, talking about those ideas seemed as distant as imagining living in a society with advanced technology like that in science fiction movies, such as Star Trek.
3. The Fortuitous Encounter with Data Analytics, Digital Health, Biostatistics, and Leo Breiman
Despite my research project during my mathematics studies focusing on differential equations in Banach spaces with semigroup theory (I will never forget the proof of Yosida’s theorem), I decided, during my master’s, to embark on an innovative project: predicting maximum oxygen consumption using functional data techniques at the Pontevedra High-Performance Centre, home to our two world triathlon champions for years. Surprisingly, the results were successful, allowing the prediction for the first time, with a new statistical method, of this key physiological variable for athletes and cardiac patients (you can see the associated publication here).
Although my master’s thesis was enormously enriching for me, the master’s in statistics, as an initial stage of my research career, was not an especially stimulating and creative period. I felt it was not the right study for my life goals and spent long afternoons in the library compensating for the deficiencies I perceived. In particular, I missed some fundamental elements:
- Deeper Learning of the Genesis of Statistical Methods: A lack of deep understanding of why certain statistical techniques are important and how problem generation motivates them, beyond introducing simple formulas.
- Advanced Mathematical Theory: Empirical processes that are the foundational tool for the future of biostatistics, for example, in adaptive clinical methods or in survival methods with neural networks.
- Integration with Biostatistics and Applied Statistics: Applied statistics and biostatistics were not perceived as fields of great interest, and there was some disdain towards the word "applied," despite the master’s programme including a large number of biologists and barely any professors with advanced knowledge of probability theory.
- Interaction with Computer Science and Artificial Intelligence: These areas were perceived as disconnected and distant from statistics, or even as competitors.
Despite these challenges, by chance of life and perhaps driven by my own curiosity, in a project I proposed on the impact of bagging in statistics within the resampling subject, a new world opened up to me upon meeting Leo Breiman through random forest and the subsequent reading of his work "Statistical Modelling: The Two Cultures." This work by Breiman indirectly defines the concept of data science and the future of data analytics by combining machine learning and statistics with historical antecedents from Tukey, among others.
It is sometimes said that books or readings are not chosen but appear at the most opportune moments. In my case, this was the second magical reading that impacted me (at another time I will recount the first, reading the preface of a chess book at eight years old). In the goodness-of-fit subject, a small discussion arose about that great article (and not because of Breiman's opinion on goodness of fit!), which led me to reread the manuscript a second time and change my opinion on data analysis concerning the mentioned limitations. Leo Breiman provoked significant reflections on useful research and models, questioning the theory of many articles published in the Annals of Statistics and highlighting the importance of designing algorithms for specific applications. This perspective changed my life and my view on data analysis forever, motivating many decisions up to the present moment, including two subsequent stories that I will tell later.
After the master’s, I remained lost about what the next steps in my career should be, facing uncertainty and, without knowing why, decided to start working as an associate researcher in the Epidemiology Unit. There, a new world appeared before me, perhaps to fulfil my vocation of creating algorithms that would transform medicine. I do not know exactly why, but I always think it was the force of destiny, God, or some special energy that helped me choose that path. A key element of destiny was my task within that work in the field of digital medicine with glucose monitors: to seek new and more powerful ways to extract useful clinical information from time series to predict the evolution in non-diabetic patients, a star topic today in glucose metabolism.
Another important element in my life was the luck of having Francisco Gude as my boss, who always believed in me and fully encouraged my independence, just as great masters do. Finally, and something that not even extreme value theory can explain, the data from the AEGIS study, the longest and oldest longitudinal monitoring study of continuous glucose monitoring in healthy populations, came into my hands. What are the possibilities? How was it possible that a tourist paradise like Galicia had set up that study eight years before large scientific powers like Israel and the United States, where million-dollar investments are made in science? And, luckily by destiny, I was responsible for promoting new statistical methods to analyse these data and drive continuous glucose monitoring studies with those data.
4. The Glucodensity, the Potential Energy of Data in Survival Analysis, and Conversation with Professor Stute
In my work with the AEGIS study monitoring data, I devised a novel biostatistical concept (which I hope will become very famous in the near future) called glucodensity. In my opinion and that of others, this is the biomarker of the future for much data from wearable devices. It consists of a new functional representation that captures, for each glucose measurement range, the proportion of time an individual remains within that range across a continuum of intensities. Currently, this analytical approach, restricted solely to the field of continuous glucose monitoring, is being applied in various clinical trials of different pharmaceutical companies. Additionally, the method is being validated for integration into the software of specific continuous glucose monitors. This story reminds us of the importance of having real data to address significant scientific problems and how, at least so far, new guided data structures have driven the advancement of the field of biostatistics in recent decades.
Simultaneously, and out of personal interest at that time, I decided to deepen my knowledge in survival analysis and attempt the methodological extension of the distance energy context in survival analysis to a two-sample context, in order to compare the effect of clinical trials on survival time in immunotherapy oncology studies. The story of this paper, which was finally published last year, is complex and I will tell it another time. However, I want to highlight certain lessons and facts about how this research topic has been useful to me.
-
Development in the Field of Survival Analysis: Thanks to this paper, I delved into the field of survival analysis and was able to develop specific models, making significant quantifications in survival models with functional covariates. This included the first curve model, deep neural networks, and in the field of uncertainty quantification with hybrid methods that combine resampling techniques or conformal prediction methods.
-
Interaction with Professor Stute: Working on this topic coincided with a visit from Professor Wilfried Stute to Santiago de Compostela, which allowed me to have two meetings with him. This represented my true awakening and influenced all my subsequent steps in my scientific career, as I will describe below.
For those who do not know, and when I went to speak with him I was not fully aware, Professor Wilfried Stute is one of the most important figures in statistical and mathematical probability. Between the 1970s and 1990s, globally, he conducted completely revolutionary work, such as the theory of conditional U-statistics published in the Annals of Probability in 1991 (see here), or the first central limit theorem for censored variables, a result that will be remembered in the history of statistics. Precisely, the excuse to have a meeting with him was that I wanted to discuss one of his U-statistics on censored variables, which was instrumental in proving the asymptotic limit properties of the distance energy statistic for right-censored data.
The first thing that struck me about our conversation was that, for the first time in my life, I felt I was speaking with a genuine master. When one is a child and watches movies from the Star Wars saga, The Lord of the Rings (many of them inspired by the works of the mathematician and biologist Joseph Campbell), or Master Yoda, there always appears an iconic person with immense wisdom who inspires and enlightens your future with their speeches. In all my previous academic experience, I never had a similar feeling; many professors were limited to giving standard advice and teachings based on classic textbooks, but this man, every word he said was a new lesson, and what was important was the peace and security that characterised him.
I will not dwell too much on our conversation, but I will say that Stute, after a successful career in mathematical probability and statistics, stated that the most difficult part of statistics is applied statistics—a novel idea according to previous teachings (except for Breiman’s work). He emphasised that this was the essence and real difficulty of statistics, not the underlying mathematics. Another important aspect of our conversation was when we discussed his paper on conditional U-statistics. He shared a story that, twenty years later, was being explored in the study of conditional rankings (a particular case of his theory) by another researcher who has been significant in my career and whom Stute defined as one of the best in the world. This story prompted me to undertake a stay with him.
5. Boston and My New Change of Thinking with the Singular Yinchao
After completing my doctoral thesis, I arrived in Boston thanks to scientific results such as the so-called Glucodensity. Travelling to the United States and to an emblematic city like Boston was an impactful experience. From the flags waving at the airport to the unmistakable sound of Terminal E at Logan Airport, Boston presents itself as a city where revolutionary events and the most incredible dreams come true in science and beyond. However, in this narrative, I will not focus on personal adventures or growth stories in Boston, but rather on describing the philosophy of change that inspired me through my great friend Yinchao.
Yinchao is a peer of similar age who believes more than anyone else that the future of medicine lies in data analytics. From the first day I met him, his opinions and his level of maturity continually amazed me. Most importantly, Yinchao is a tireless worker whose ideal of life is simple, but whose great goal is to make a significant contribution that transforms biostatistics.
-
Motivation to Come to the United States: Yinchao decided to come to the United States, firmly believing that the mathematics of statistics are not inherently complex. For him, the essential thing is to create the motivating elements that lead to the final estimators, just like the wise words of Breiman and Stute. He thought that the United States was the ideal place to learn and compete, as most researchers who drove revolutionary ideas in the field of statistics work there.
-
Importance of Hard Work and Passion: Yinchao believes that hard work and passion are more important than innate intelligence. He maintains that, with willpower, it is possible to achieve anything you set out to do. This deeply impacted me, as I always considered Yinchao much more intelligent than myself, but he had more confidence in my abilities than I did.
-
Relevance of Biostatistics in Medicine: Yinchao is convinced that research in biostatistics is the most important of all sciences, as modern medicine depends on data analysis and innovative algorithms that can impact the clinical development of millions of patients.
-
Commitment to Learning and Collaboration: Yinchao is above all a scholar who believes that learning from great masters is the only path to scientific excellence. Being an avid reader is, for him, the most important part of a scientist's work: thinking before writing, building science from life experiences, and carefully choosing colleagues. Selecting carefully your collaborators is considered more crucial than selecting a good wife for his daughter.
-
Resilience and Patience in Research: Yinchao taught me the importance of resilience and patience in science. Despite having worked for two years on a manuscript and facing the failure of his method with empirical data, he told me, "Marcos, these things happen in science. The important thing is not to give up and keep trying. If in two years I do not find my great contribution that will change everything, then the path was not enriching and difficult enough."
6. Final Reflections and Some Conclusions So Far of My Journey
To conclude this story, I would like to summarise the most important lessons from my journey (not the pilgrimage to Santiago that we should all undertake once in our lives), although some may seem well-known, along with a series of final personal reflections on the future that lies ahead.
-
Curiosity is Fundamental in Science: Although it may seem naive, in the academic world, curiosity is not always valued or can even be penalised. As researchers, we tend to stay within our comfort zones and not explore new ideas or change our main research area. Nowadays, science is oriented towards interdisciplinarity, and the concept of basic science could transform into a mere romantic idea rather than an effective reality. New opportunities lie in relentless curiosity.
-
The World is Unfair and Many Things Are Beyond Our Control: As one of my main guides said, awards and editorial decisions can be influenced by external factors such as political issues. The important thing is to work on topics you consider relevant and in which you feel happy and proud. In the long term, although it may not always seem so, the probability of your work being rewarded increases over time.
-
Persistence Can Be More Important Than Intelligence: As Camilo José Cela, the only Nobel Prize in Literature laureate from my area of birth, said, "He who endures, wins." Similarly, and although it sounds similar, Yinchao emphasises that perseverance is key to success.
-
Take Care of Your Network of Collaborators as You Take Care of Your Loved Ones: Following the words of Sun Tzu in "The Art of War":
"Look after your soldiers as you would a newborn; they will be willing to follow you to the deepest valleys. Take care of your soldiers as you would your beloved children, and they will willingly die with you."
Be faithful to those who are faithful to you, treat young researchers with respect, and convey enthusiasm and passion, even if you do not receive that treatment. My experience indicates that mediocres treat their peers worse than brilliant people, and intelligent people who want to do new things are inspiring and trustworthy.
-
Surround Yourself with Brilliant People Who Understand What You Do and Vice Versa: The idea that each person is an expert in a specific area is not entirely true; great projects succeed thanks to collaboration with extraordinary people. I remember when I was preparing a talk titled "How to Win the NBA with Mathematical Prediction Techniques" in my early days. I sought advice on the summary, and the final result was better than expected.
-
When You Are Young, Be Ambitious and Tackle Projects for Which You Are Not Prepared: If not, when will you? Things never turn out as you expect, and if you are not ambitious, you will never fully progress. You will always be several levels below your capabilities, as reality tends to reduce your expectations or final results. However, if you are persistent, you will continue to improve until you reach limits you did not consider possible, as Yinchao always says.
-
Initial Failure Can Be the Necessary Element to Win the Competition and Refocus: Some time ago, a paper I consider important to me was rejected, but this made me see how to focus the new version of the manuscript and better highlight the unique aspects of my original work against competitors, thus winning the final race. Sometimes, receiving feedback on comments is an opportunity to feel what others think, understand their weaknesses, and improve yourself. There is no greater competitive advantage than knowing how rivals think.
Finally, I believe that Yinchao’s idea of transforming some aspect of statistics reflects that the correct thinking is the engine of science that transforms people’s quality of life. In areas such as education and medicine, we should not fear artificial intelligence, as it can bring significant improvements to healthcare, enhancing clinical outcomes for millions and promoting equal opportunities. Although we do not know if Yinchao’s dream of transforming statistics will come true, it is clear that he will continue to insist, and that in the future, new Yinchaios will emerge, some more or less intelligent, but with the same passion and motivation to face forthcoming scientific challenges.
Another important reflection is that fields like biostatistics continue to evolve and need new professionals with advanced knowledge in clinical sciences, computing, and statistical mathematics. The future of medicine lies in data analysis, and significant educational changes in these areas are forthcoming.
When the Galician triathlete Gómez Noia became world champion in London, he had to improve his performance to win in English territory against the Browling brothers, also English, who dominated world triathlon at that time and seemed invincible in the 10 km run. Now, many of the advances in the biostatistics of the future will depend on new non-asymptotic probabilistic theories. If we want to triumph in the sprint like Gómez Noia and fulfil Yinchao’s dream, we must also improve our knowledge in probability (beyond imagination and Leo Breiman’s book from the 1960s) to continue progressing in the field and being more resolute and innovative.
The era of p-values has ended (if only the era of e-values should have existed), and we are in a new predictive world with tools like conformal inference. It is not about new emotionally apparent adventures in this century’s biostatistics; the new generation of biostatisticians will be guided by ideas that will attempt to bring revolutionary new knowledge in an interdisciplinary context, like triathletes capable of swimming, cycling, and running very fast. The first time the four-minute mile was attempted, it seemed impossible until Banister achieved it, and shortly after, hundreds of runners did the same. It also did not seem possible to approach Zatopek’s Olympic feat of successfully competing in the 5,000, 10,000 metres, and marathon, but this year a Dutch runner achieved it almost 100 years later and soon after broke the female marathon record and the dream of running under 2:10, something incredible considering that ten years ago the record of the legendary Paula Radcliffe seemed an authentic feat. Often, the limits lie within ourselves and our will, as in Yinchao’s case. Humans have been characterised by their adaptability and progress in the face of change, and this will happen again in this new era of generative artificial intelligences, where medicine will become more effective and humane, despite the lack of a majority consensus on it.
Yinchao’s future, like mine, is yet to be built, but at least I hope this story is entertaining and only mention that there are many interesting and important details that I omit for brevity.
I want to end this writing by sincerely thanking the more than 50 scientific collaborators in my short career and the more than 25 countries that, for better or for worse, have motivated my thoughts and my way of experiencing my life. Artificial intelligence may automate many daily tasks, but it will not have the capacity to feel and communicate emotions like humans, at least for an intelligent audience and their capacity for amazement.

Comments
Post a Comment