In this episode of the GovFuture podcast we get the opportunity to interview Dr. Susan Gregurick who is the Associate Director for Data Science and Director of the Office of Data Science Strategy at NIH. She joined our May 2023 GovFuture Forum panel where she discussed how NIH is unleasing the power of advanced analytics. In this podcast episode we were able to dig a bit deeper on this topic. Susan discusses how NIH is currently using advanced analytics to drive data-driven decision-making, some of the unique challenges the NIH faces in leveraging advanced analytics in areas related to data privacy and security, as well as the role advanced analytics is playing in supporting the NIH’s mission to improve human health through scientific research.
If you’re looking to get more insight and detail on the range of technology that we discussed in this podcast and other topics. Check out our resources, books, courses, checklists, explainer videos, webinars, and more at govfuture.com/resources, tailored for our GovFuture listeners!
If you enjoy listening to this podcast please rate us on apple podcasts, Google, Spotify or on your favorite podcast platform. Also, if you’re not already, consider becoming a GovFuture member to take advantage of all the community has to offer including Access to a diverse network of government innovators, opportunities to collaborate with government agencies, exclusive access to events and resources, and a platform to have a voice in shaping the future of government innovation. To sign up go to govfuture.com/join.
Trimmed Episode Transcript: (note there may be transcription errors or mis-attributions, so please consult the audio file for any potential errors)
[Kathleen Walch] We’re so excited to have with us today Dr. Susan Gregurick, who is the Associate Director for Data Science and Director of the Office of Data Science Strategy at NIH. Welcome Susan and thank you so much for joining us today.
[Dr. Susan Gregurick] Thanks, Kathleen. Thanks, Ron. It’s great to be here today.
[Kathleen Walch] Yeah, we’re really looking forward to this discussion. We’d like to start by having you introduce yourself to our listeners and tell them a little bit about your background and your career. role at NIH.
[Dr. Susan Gregurick] Love to. Well, as Kathleen mentioned, I am the associate director of data science and the director of the Office of Data Science Strategy here at the National Institutes of Health in Bethesda, Maryland. I got started way back quite a while ago as an undergraduate in studying chemistry and mathematics at the University of Michigan and then after my PhD in physical chemistry from the University of Maryland, GoTurps, I started my career as a professional and as a professor of computational chemistry at the University of Maryland, Baltimore County.
My interest at that time was protein dynamics and protein function. After about eight years or so at UMBC, I went over to the Department of Energy to serve as a program director in data and computational sciences and I helped DOE develop the knowledge base for microbial plant environmental research and that’s where I really started to have a real passion and love for data science. About ten years ago, I joined NIH as the division director for biophysics, biomedical technology and computational biosciences and that was at the National Institutes of General Medical Sciences or we like to call it NIGMS, we love acronyms in the government and I have had a really long career in federal service and I can say quite honestly I have a service heart. But my role at NIH and the Office of the Director of the Office of Data Science Strategy started maybe four years ago and it’s been a lot of fun because our office leads the implementation of the NIH strategic plan for data science which works to harness emerging opportunities and advance cutting-edge data science across all of our 27 institutes and centers and this really means that we focus on activities such as data interoperability, platform interoperability, data accessibility, data standards and standardization and the reuse of data.
We also support the establishment of policies and related privacy and ethics and data sovereignty policies as well as promoting the principles of diversity, equity, inclusivity and accessibility. I really feel very fortunate to be in this role and I can see that we are starting to make a significant difference in these cutting-edge technologies in data science so that’s a bit about me. Hopefully it’s not too long-winded and I’m glad to be here today.
[Ron Schmelzer] Well it’s fantastic. I think yeah that’s what listeners want to hear. You know not just you know government adopting great technology but knowing that great technologists and innovators such as yourselves have put a significant amount of time and effort really advancing the state of not just innovation than the use of data in the government and it’s actually good to know that especially in areas that are so impactful such as health and medicine, where we can use real data to really advance things.
I think people are, you know, this is good for people to hear that. So like on that note, let’s talk about the use of advanced analytics and more advanced forms of technology, of course, artificial intelligence and automation and things like that. Maybe you could tell us a little bit about how, you know, I know there’s 27 institutes and groups within NIH. There’s a lot there that people may not understand the breadth of it. Maybe you could tell us a little bit about how AI, advanced analytics, automation, those technologies are being used to drive data-driven decision-making and perhaps some of the benefits you’re seeing.
[Dr. Susan Gregurick] Absolutely. This is such an exciting time to be in data science with the real significant advances in artificial intelligence and generative AI that we’ve just seen recently. So, you know, AI and advanced analytics really does speed the discovery of science and for us, bringing treatment and cures to keep people healthy. So one of our policies and one of our priorities is the ethical use of advanced analytic technology like artificial intelligence.
My office is supporting a number of activities to enhance both the diversity of researchers who are taking advantage of AI and also the diversity of data sets that are used in AI and ML. So just for example, one program that I’d like to tell you about is the AIMAHEAD program. AIMAHEAD stands for the AI ML Consortium to Advanced Health Equity and Researcher Diversity and it’s working with researchers and institutions across the entire country, again, to enhance the participation and the representation of researchers and communities in the development of AI models and also to improve their ability to take advantage of this emerging technology.
And NIH and Dr. Tayback and myself are very proud of the AMAHEAD program and are looking forward to their new advances as they recruit new students and new scholars, fund new cutting-edge programs and priorities and really make an impact in AI models and the AI community. Another, taking a little more specifically into some of the capabilities that we’re able to see with AI is just for example, the analytic speedup of diagnostic and treatment capabilities. So just for example, not long ago, an infant that was born with a rare genetics disease, I would take a really long time to test, get their sequence sequence, annotate it and then try to figure out what their rare genetic disease might be. But recently with the advent of AI, we can now very quickly sequenced the entire genome of the infant compared against a number of variants that we have for over 13,000 genetic disorders. And in five minutes, actually pinpoint what that disorder might be and then increase the capacity to develop the therapeutics that that infant might need.
And that’s a big time saving when you’re looking at small children, you know, babies who are born with rare genetic disorders. So those are just some of the cutting-edge technologies that we’re working on both to enhance our capacity to use AI and also to really speed the care delivered to patients using AI.
[Kathleen Walch] Yeah, we always love to hear real-world examples because then you get to see how it’s put to use, right? You know, you can talk about technology, transformative technology, but actually see how it’s being applied, especially for people who, you know, may have children with disabilities to get that diagnosis, you know, within a few minutes, as a few months or years, that really means a lot.
So we like to hear these examples. But whenever we’re talking about data and dealing with complex data sets, especially when it comes to health care, there can be some challenges that arise. And we need to make sure that we’re, you know, being responsible and ethical with that data. So what unique challenges does the NIH face in leveraging advanced analytics and dealing with those complex data sets, while also ensuring data privacy and security? And maybe how are you addressing those challenges?
[Dr. Susan Gregurick] That’s also a great question. And for agencies such as mine, which deal with health data, we do have significant challenges with data privacy, as well as access to data. You know, in order to take advantage of AI, researchers do need a significant amount of very good data. And that means that they need to access data across a number of repositories. Right now, NIH supports roughly 80 controlled access data repositories. These are data repositories that hold human data.
They protect participant data privacy and security. So there’s a, you know, a fairly complicated process for researchers to get access from these repositories. And they need to, at least prior to my coming to NIH, they need to do this multiple times.
And it was quite duplicative and really quite onerous. So recent, recently, we’ve standardized the way that researchers can gain access to controlled access data repositories. This is called Research Health Services, or RHS. RHS provides a single sign-on capability. Right now, we’re working with nine systems, nine of our largest data systems, with many, many more in the process of adoption. And in this way, by, you know, sharing credentialing systems across our data repositories, We’re standardized in the way that researchers log in and the way that we can track and log basically potential data breaches and data security issues.
So this is one way in which we’re also implementing the Administration Zero Trust priorities for data system infrastructure. The other challenge that we have is really about research data that’s generated. That’s not really quite AI-ready. So our office is supporting a number of programs that bring together biomedical researchers as well as data management experts and AI and ML experts to ensure that data that’s generated through NIH funding is essentially AI and ML ready. It’s ready to be shared through repositories and it’s ready to be used. So one of the areas that we’re particularly interested in right now is the advances of ethical developments of AI-ready data, making sure that we can not only provide data that is useful for AI but that there’s an ethical framework for the use of data and health applications.
[Ron Schmelzer] Yeah, I think that’s great. I mean, that’s a lot of different applications and a lot of ones that people may be very familiar with in some way in terms of they understand their relationship to health and medicine and just in general about that. And I think that may be different. Like I noticed you did some work with the DOE. Maybe some of the things that DOE does, we may never know about.
We might or some things we may find out about decades later. I think that’s what part of what makes this very interesting. And I think as it relates to our audience here, what they’re thinking about maybe about how they’re putting innovation into practice in their own organizations, maybe they’re working within a government agency, maybe they’re working in federal, state, local, or international, or perhaps working in a government contract or a vendor. And they may be interested to hear a little bit about the direction that some of this technology adoption is going. And so maybe from that perspective, what role do you see AI and advanced analytics playing in supporting the overall mission, the NIH’s mission to improve human health through scientific research, and maybe some little bit of insight into where the agency is investing and perhaps looking out a little bit into the future as to where you see things going?
[Dr. Susan Gregurick] Yeah, that is such a great question. I was just looking today at our portfolio of AI funding in this past year. So we can only report on the past years, because we’re right in funding year 2023.
So in funding year 2022. And I just supported about $269 million worth of grants, contracts, and other transactional authorities just to support AI research. That’s just AI research and that’s a significant amount of money.
Most of that funding does go to R01, that’s single investigator or multi-PI investigator initiated grants to academic institutions, to medical schools. Of course, we have our big programs such as AIMA Head that I described, our Bridge to AI program and others. But as we’re looking to where we’re going to advance in the future, it’s pretty clear that artificial intelligence and advanced analytic technologies are going to be playing much more of a pivotal role in biomedical research. That will help us make new health discoveries and develop new treatments.
I’m excited to see this play out. Just in the literature, there’s a new journal, the New England Journal of Medicine for AI with new editor, Dr. Zach Cahoney, who is the editor-in-chief, leading expert in biomedical AI. That gives you a hint that, in fact, artificial intelligence will be playing much more of a bigger role in biomedical research in the future.
In addition to that, just the advances that we can make in computing will be pivotal to those developments. So I’d love to tell you a little bit about our work on cloud computing. Through our STRIDES initiative, STRIDES is a partnership with Microsoft Azure, Amazon Web Services, and Google Cloud platforms. This allows NIH researchers to explore the use of the cloud. And with this investment, our funded investigators and institutions are working together to develop a robust and interconnected data ecosystem that will break down the silos and really help generate, share, and analyze research data. And so I think looking forward, the future for AI analytics and data science at NIH is really quite bright.
[Kathleen Walch] Yeah, it sounds like it. And we loved all the examples that you give because we really do like to see how this is done and implemented in the real world. Sometimes it’s done in ways people expect and sometimes it’s done in ways people don’t expect.
That’s what I think energizes everybody and gives people ideas. So we have had such a wonderful time on this podcast. And before we wrap up, we’d like to ask our final question that we ask for all of our interviews. What do you see or hope to see as the future of technology and innovation in the government?
[Dr. Susan Gregurick] So I have maybe four points that I’m really excited about. I’ve been following the rise of AI and large language models.
Certainly this is coming to everybody’s attention right Now harnessing these capabilities for biomedical research is a opportunity that we shouldn’t miss. Even the caveats that large language models do have the tendency to create erroneous results. This is something we do need to pay attention to, but imagine the possibilities of developing language models that are trained on metadata, information about data across many different repositories, and then pulling in all that information, that metadata together and layering on top of those language models, those developed from publications and literature. We could see a significant advance in data search and discovery and data, and essentially just data creation at unprecedented scales. So where will this lead us? I could see this leading to the analysis of petabytes data as a routine computational research. It might also open up new paradigms and data computing and lead to new computational architectures, which I’m totally excited about. Data sharing is good, but data that can be merged and integrated together in a comprehensive way that adds knowledge is better. And finally, and this is maybe my most important point, is data science is also about the people who work in these fields. It’s the future of our technology innovation, and it’s really closely tied to principles of inclusivity, diversity, equity and accessibility. When we create a diverse workforce, either here at NIH or in the research community, we actually include people with diverse opinions, and that generates new ideas and new approaches.
And those out of the box thinkers are where we can address some of our biggest challenges. And so that is a goal that I try to live to every day.
[Ron Schmelzer] Yeah, I think so. I mean, that’s great. I mean, obviously, one of the greatest things that we can have with technology and innovation in general is that it makes our society so much better. It gives us more ability for us to achieve our goals we want. And of course, longer life and healthier life, better quality of life, all of those things, you know, that’s what we want from technology. That’s what we want from innovation. And of course, that’s what we want from our government, implementing those things to make that work.
So this is all fantastic. We can definitely spend more time. As a matter of fact, we have, if you haven’t yet listened to or seen our Gov Future Forum event that we have in DC, that’s the third Thursday of every month at George Mason University. We have the recording from the panel from that event where Dr. Susan Gregurick was there presenting and being part of that panel.
You can hear that panel recording separately. And I encourage you to. you to do so.
That’s again, one of these podcasts we like to provide that for some of you, many of you who aren’t able to join us in person. I encourage you to do that. But of course, we will continue to engage not only Susan, but all there is at NIH and of course other folks in the government. So I just want to thank you so much for your great sharing of all your insights with our Gov Future podcast audience.
[Dr. Susan Gregurick] I am super delighted to be here today. I can’t wait for the next one. I totally enjoy GovFuture.
I’ve enjoyed the panels and the podcast. Thank you so much for inviting me and I hope you’ll connect with my colleagues at NIH on their great ideas for technologies.
[Kathleen Walch] Yes, we absolutely will. And we’re looking forward to getting them involved as well. So stay tuned listeners for that.