#08 - Handling health data in research
Ecosystem
Episode duration 00:26
For this eighth episode, "100 Days to Success" focuses on data processing in healthcare research and innovation.
00:00:00
Voiceover: "One Hundred Days to Success." This is the podcast from G_NIUS, the Guichet national de l'innovation et des usages en e-santé. Around Lionel Reichardt, meet e-health innovators and key experts to help you succeed in your projects.
00:00:20
Lionel Reichardt: Hello everyone and welcome to the "One Hundred Days to Success" podcast. The podcast for innovators and entrepreneurs in digital healthcare, but also for anyone curious about the field. This podcast is produced by G_NIUS, the Guichet national de l'innovation et des usages en e-santé. In this episode, we talk about data processing in healthcare research and innovation. To do so, we welcome Alexandre Vainchtock, pharmacist and co-founder of HEVA, a healthcare research and consulting company. He is also a qualified member of the General Assembly of the Groupe d'intérêt public, plateforme des données de santé, better known as the Health Data Hub. We also welcome Emmanuel BACRY acry, research director at CNRS and scientific director of the Health Data Hub.
00:01:08
Lionel Reichardt : Alexandre Hello and thank you for sharing your experience with us. Could you first tell us about your training and background?
00:01:16
Alexandre VAINCHTOCK:Hello and thank you for your invitation. So I'm a pharmacist, a health economist. I started my career in health economics consulting and then co-founded HEVA with Ludovic Marsal in 2005. I'm also a member of a number of healthcare data working groups at Afro, the Elster Data Institute and Syntec Numérique.
00:01:42
Lionel Reichardt: So you run HEVA, a company founded in 2005. Could you tell us a little about it?
00:01:47
Alexandre VAINCHTOCK:Yes, of course. No, that the company is a consultancy specializing in healthcare data and in particular medico-administrative databases. We currently employ around thirty people with a wide range of expertise, including bio-statisticians, epidemiologists, data managers, data scientists, as well as developers and designers.
00:02:11
Alexandre VAINCHTOCK :And I'm used to saying beyond the company's seniority, which is relatively large and quite innovative since health data in the early 2000s wasn't necessarily the topic. I used to say that the company has a triple-stranded DNA, i.e. expertise in data, the desire to do datavisualization to make data accessible to non-experts and to always be in the technological innovation both on advanced statistics and on artificial intelligence.
00:02:39
Lionel Reichardt: It's clear that you're an expert in healthcare data. France is often presented as a champion in this field. Can you remind us of the genesis of this construction?
00:02:48
Alexandre VAINCHTOCK:Yes, of course. If we're talking about medico-administrative data, it's actually starting to date a little since the famous SNIIRAM we hear a lot about was created by the 1999 Finance Act. Then, over the years, the data warehouse was created. Then the chaining of data between the city and the hospital was done.
00:03:09
Alexandre VAINCHTOCK:But the real paradigm shift came in 2016 with the famous healthcare systems modernization law known as the Touraine law, which really changed things in ten years. Data is now open to public and private players alike, as long as they pursue a public interest. And so, it's true that this has greatly facilitated access to data.
00:03:35
Alexandre VAINCHTOCK:Even if this access is regulated within a very specific framework. But it is, it's true that it's allowed us to work on these bases that have few equals in the world. To be frank with you, because on the one hand, we have a centralized health insurance system and so the base covers almost all 67 million French people.
00:03:58
Alexandre VAINCHTOCK :And it's true that we're used to talking about the Nordic countries, but we're not at all on the same population sizes and access in countries like Germany or England are not necessarily easier than here. As for the United States, where there is experience, we are also working with this data.
00:04:15
Alexandre VAINCHTOCK:Well, there's a public Medicare and Medicaid system, but after that plenty of private insurers. The advantage of this database is that it really allows us to work on a pathology, on the exhaustiveness of the population. What's more, we're talking about reimbursement data, which are very specific.
00:04:33
Alexandre VAINCHTOCK :But if we look beyond that, health data is as it is defined today by the CNIL and which effectively takes into account both the information that is collected when you go to seek care, the information that is obtained during an examination or that is retained at the time of the management of a pathology.
00:04:54
Alexandre VAINCHTOCK:These data, too, are globally health data that we'll find within clinical study registries, [inaudible] and these databases there are also complementary to these famous public medico-administrative databases we're working on.
00:05:09
Lionel Reichardt: You mentioned SIIRAM, the national inter-regime health insurance information system. There's also the PMSI, Programme de médicalisation des systèmes d'information. Then there's DREES, the French Research, Studies, Evaluation and Statistics Directorate, to name but a few. Can you give us an overview of the main databases?
00:05:30
Alexandre VAINCHTOCK:To put it simply, we'll say that there are city bases and hospital bases. So the city base is DCIR, and the hospital base is PMSI. Together, these two databases make up what we call SNIIRAM. Then there are other databases, such as the CPDC database on medical causes of death. And then there's a whole host of databases that exist, developed by researchers, clinicians and registries.
00:05:55
Alexandre VAINCHTOCK :And it's the whole philosophy of the Health Data Hub to say we're creating this platform to make available not only public and medico-administrative databases, but to make producers of other databases want to make these databases available once again for public research projects, as part of scientific projects.
00:06:15
Lionel Reichardt: You mention [inaudible] which embodied by legislation in 2019 and open data dates back to 2014 2015. Can you remind us of the objectives of this opening up of healthcare data and what led to the creation of the Health Data Hub?
00:06:28
Alexandre VAINCHTOCK:Before the Health Data Hub, there was the Institut national des données santé and before the Institut national des données de santé, there was the Institut des données de santé. I think the law, and even a little before, since this law was prepared as early as 2014. The law and the legislator and the experts realized that there were many, many databases in France and that access to these databases was not necessarily very simple, on the one hand, and not necessarily homogeneous depending on the status of the public and private applicant.
00:06:56
Alexandre VAINCHTOCK :And so, this law has put all that back on the table by saying it doesn't matter what the status of the applicant is, what's important is the purpose of the request, and what we want to do. And indeed, what's interesting is that today, as soon as we write a protocol, we describe the research, we describe the objective, we describe the methodology and we can have access to this database. And the Health Data Hub is the culmination of all this. As there was before, with the Institut national des données de santé, a one-stop shop.
00:07:23
Alexandre VAINCHTOCK :So, this means that you have a contact person who can answer your questions and guide you in the search for the data - does it exist? And how can I access it? Furthermore, the philosophy of the Health Data Hub in the years to come - and we're really in the middle of the ford here - is once again to enrich public databases with other databases that have been developed by both public and private players, but in this case, much more specifically on a given pathology. We're thinking here of orphan diseases.
00:07:54
Alexandre VAINCHTOCK:We're thinking of all the registers that exist in surgery and which have extremely interesting historical depths, but which are very clinical, very medical bases, which is sometimes lacking in our medico-administrative bases. On the other hand, they have the advantage of being extremely exhaustive, both in terms of the population tracked, but also in terms of the completeness of the care reimbursed.
00:08:18
Alexandre VAINCHTOCK :And so it's through the pairing of these two types of databases that we'll be able to really enrich our research and be able to develop extremely innovative projects, both in epidemiology, both in statistics, but also what we believe in a lot here, in everything that's predictive models based on artificial intelligence.
00:08:40
Lionel Reichardt: As you said Alexandre, these are medical and administrative databases. They often concern reimbursement and are not strictly medical data. So how can we use these data for a purely medical purpose?
00:08:53
Emmanuel BACRY: You're right. On the hospital part, there is still medical information with the International Classification of Diseases. Even if it's not extremely detailed. And it's not an anatomopathological classification. On the other hand, it's clear that when you go to the doctor, he prescribes medication, but he doesn't tell you the diagnosis of your illness, except when you're on long-term care.
00:09:17
Alexandre VAINCHTOCK:How do we do it? we work with experts. Again, we can't work alone. So the French Health Insurance has published what we call the Carto de la CNAM in our jargon, which is a list of 56 pathologies, algorithms that have been developed with researchers and which therefore enable us to work on them. And then, when these algorithms don't exist, well, we work with clinicians upstream of the project to finally define the best algorithm to identify the pathology we're interested in or the stage of the disease we're interested in.
00:09:51
Alexandre VAINCHTOCK :And then, when the medico-administrative database alone doesn't allow it, that's when we'll do what we call matching with other databases. This brings us back to what I was saying earlier about matching with clinical data and registries. And the whole point of the 2016 law is that it now allows matching. And today, there are a number of projects that have been submitted to the Wells Data Hub, where this has also been done. And I think that's really the future of these matched projects these two types of data.
00:10:23
Lionel Reichardt: There are over 3,000 projects referenced by the Health Data Hub. What is the main typology of these projects?
00:10:29
Alexandre VAINCHTOCK:From now on, the Health Data Hub receives all projects that don't involve the human person. It's a bit of a complicated jargon, to put it simply, you have everything except clinical studies which are there to prove the efficacy of a drug we don't administer a product. So you have both projects based on these medico-administrative databases. And then, you also have plenty of projects that are based on medical records, registries, etc.
00:10:55
Alexandre VAINCHTOCK:This is quite interesting, and I invite the people listening to us to go to the Health Data Hub website since there's a specific section that exists with a high-quality Data visualization that allows you to see both the typology of the projects, but also the typology of the applicants for these projects there or the funders. In any case, what we can see, and what is a real change, is that overall, more than a third of projects today relate to or use SNDS data in one way or another, which really proves that there is a collective awareness among healthcare players of the value of these databases.
00:11:31
Lionel Reichardt: So this means that a project developer may need to use this medico-administrative database, to file a project with the Health Data Hub. What are the requirements for submitting a project?
00:11:42
Alexandre VAINCHTOCK:In which cases? It will depend on the profile of applicants. If we look at healthcare establishments that do a lot of projects using these data, we're going to be more sure of basic research. And once again, matching with registries. When we look at healthcare manufacturers, whether it's the pharmaceutical industry or the medical device and biotechnology industries, they're going to carry out this type of project, either upstream of the launch of a healthcare product in order to determine the target population, find out the patient's pathway, find out the cost of the disease or the other way around,
00:12:21
Alexandre VAINCHTOCK:After the launch, as part of the discussions we may have with the health authorities to see how the product is used in real life. Is it being used, as he said it would be? How often? At what doses? What are the patient profiles in real life? And this is, in fact, widely used today for these purposes. So it allows us to see many, many things in real life, on an exhaustive population. After that, in terms of obligations, the law is very clear.
00:13:02
Alexandre VAINCHTOCK :So it requires manufacturers of healthcare products to work with a research laboratory or design office, which in turn commits to a standard of expertise, independence and confidentiality. Unless the manufacturer can demonstrate that the way in which the data is processed makes it impossible for it to be used for prohibited purposes, such as purely commercial or promotional purposes, etc.
00:13:30
Alexandre VAINCHTOCK:In fact, over the past three years, we've realized that manufacturers as a whole prefer to work with design offices and research laboratories that are, on the one hand, experts in the data and, on the other, provide a certain guarantee of quality and independence on the studies and results that will then be provided either to health authorities, caregivers or even the patient.
00:13:53
Lionel Reichardt: Is it within the reach of all Starts Up, to file a project and request the use of these health data?
00:13:58
Alexandre VAINCHTOCK:I think that's the philosophy we're all pursuing, and in particular the health Data Hub, that it's no longer reserved for experts and I can encourage those entrepreneurs who would like to work on data on the one hand, not to hesitate to get in touch with the health Data Hub. I think I used the word "counter" earlier. It's anything but pejorative. Today, there's a team of extremely competent people who are there to answer questions from people who don't know much about it.
00:14:27
Alexandre VAINCHTOCK :Then, once we've said that, there's how to concretely set up a study. And here, I think it's preferable for these entrepreneurs, for these people involved in innovation, to call on experts in the field, be they design offices or research laboratories, because I think they'll undoubtedly be familiar with the database they may have produced. But after that, data management is a job, etc.
00:14:54
Alexandre VAINCHTOCK :And what's quite interesting is that we're used to saying with the engineers in the team that today, the engineer alone isn't much of anything in the end, and that it's really the triptych between the data manager who's going to dissect the data, the clinician who's going to medicalize it, who's going to give it all its medical meaning and finally, the data scientist engineer who's going to exploit it. And I remain convinced that it is this triptych that will enable us to develop tomorrow's projects. I think that people today who want to work on data, we all need to work hand in hand.
00:15:23
Lionel Reichardt: To conclude, what advice would you give to a project owner who would like to work on their healthcare data?
00:15:28
Alexandre VAINCHTOCK:I think I just answered you previously, that is, once again it's contacting the experts. Today, all experts and consultancies are referenced on the Health Data Hub website. And once again, the Health Data Hub is there to support them. I think that when you look in the rear-view mirror, a lot has been achieved in 3 years. When I say 3 years, it's since the INDS was set up, following the 2016 healthcare modernization law.
00:16:20
Alexandre VAINCHTOCK:Of course, not everything is perfect, there's still room for improvement when it comes to data access times. But when you look at the experience I've had for almost 20 years now, the work is gigantic and I think that France really doesn't have anything to be ashamed of compared with its European neighbors or even wider, and we all need to capitalize on this because France really can be the leader in health data in Europe and the world.
00:16:20
Lionel Reichardt: Alexandre Weinstock, thank you for your testimonial. You ask questions about contacting the Internet to use health data. Elements of answer with Emmanuel BACRY akery, research director at CNRS and scientific director of the Health Data hub, platform, which implements the major strategic orientations relating to the national health data system set by the State, in particular by the Ministry of Solidarity and Health. Emmanuel BACRY akery, Hello, could you introduce us to the Health Data Hub?
00:16:52
Emmanuel BACRY: In France, we have some very, very fine health databases that can be used to do some fine research projects often, but which still remain, in my opinion, very under-exploited, because of three main problems. Firstly, because they are scattered across a huge number of centers. Secondly, because their governance structures are very heterogeneous and sometimes extremely complex. So it can be very difficult for a researcher or a company to access these databases. And thirdly, because they are not very interoperable. But what does interoperable mean?
00:17:27
Emmanuel BACRY: This means that we can get the databases to talk to each other. And we know that in the age of Big Data, yes, it's good to have very large databases. Of course, it's all very well to make databases bigger, but in fact, there's something that enables us to enrich them infinitely more than simply making them bigger, and that's coupling them with data from extremely heterogeneous, extremely diverse sources. So those are the three problems that mean our databases are under-exploited. And the Health Data Hub is there to try and respond to these three issues.
00:17:58
Emmanuel BACRY: In practice, how does this work? We're going to publish a catalog of large databases that will be available to operate public interest research on the Health data hub platform. So, in practice, we're going to publish a catalog of large databases that will be updated as we go along, so large databases that will be available for public-interest research on the Health Data Hub platform. Of course, among these databases, the first, I would say, is the historical SNDS database.
00:18:34
Emmanuel BACRY: So, the medico-administrative base that corresponds to the basis of the Vitale card and the hospital pathway. But there are also going to be many databases that will be put in quality in the hub and that will be interoperable, notably with the SNDS. I'm talking about interoperability. Typically, they will all be in one place and interoperable. Any company or research group will be able to submit a project to carry out public-interest research on one or more of its catalog bases, which can be linked together. They will submit this project protocol to the Health Data Hub, which will act as a one-stop shop.
00:19:17
Emmanuel BACRY: The counter will pass this protocol to a committee called the CSRS, which will validate the scientific and ethical quality of the project. If the CSRS gives a favorable opinion, the project will be sent to the CNIL, which will give a final opinion if the CNIL's final opinion is favorable. The Health Data Hub will set up a secure area on the platform where only the data required for the project will be deposited. This secure space will be accessible remotely by project sponsors to carry out their research.
00:20:04
Lionel Reichardt: What are the lead times for these various stages?
00:20:08
Emmanuel BACRY: Timeframes have already been reduced considerably compared to what they may have been in recent years and, I hope, are set to decrease further. The CSRS meets every month, so unless too many projects are submitted, you normally get a response from the CSRS within a month, in which case the project is blocked. You may receive a reserved response, with comments from members of the CSRS
00:20:40
Emmanuel BACRY: who will ask you to make modifications to the project in which case you have one month to respond to these remarks and re-submit the project or you can have a project that is noted as favorable right away. In which case, it will be sent to the CNIL. From then on, the CNIL has two months to formulate a response, which is potentially renewable. This period is renewable for a further two months, up to a maximum of four months. If you have no response after four months, it's an implicit yes from the CNIL.
00:21:08
Emmanuel BACRY: So, after four months, if you don't get a response, they have a response that they consider implicit, as favorable. And you can go ahead with your project. And then, of course, there's the time it takes to make the data available in the secure area, so the time it takes to open the secure area and check that your access methods comply with the security guidelines. It may take a little time, but that will also depend on your activity, their activity, the project owner, and how easy it is to mobilize the data. Normally, if it's catalog data, it'll be easy to mobilize because it's already in the Health Data Hub.
00:21:47
Lionel Reichardt: What advice would you give to a project leader who wants to process healthcare data for research?
00:21:51
Emmanuel BACRY: First tip: make sure you understand what data or you need to operate your search. This means that you need to be familiar with the data to which you're going to request access, and not just dreaming and saying, Ah well, I know this database looks good, I'm going to mobilize this database there and it's going to enable me to respond I think. If it's necessary to carry out a fairly in-depth study. Otherwise, even the CSRS won't pass your project anyway.
00:22:15
Emmanuel BACRY: You have to show that you're familiar with the databases you're requesting access to, and that they're going to do a good job of answering the questions you're asking. This is very important. Now, of course, you have to write your access request protocol properly, and to do this, you can be helped by all the forms that are accessible today on the Health Data Hub website, which explain all the stages of this governance process right up to actual data access.
00:22:40
Emmanuel BACRY: I'd advise you to read all these steps there if you need further information. Of course, the Hub counter is at your disposal to exchange with you on the spaces, on the discussion forums that are available. Don't overlook the fact that once you have CNIL approval, you'll need to define a data controller, who will bear criminal responsibility for any potential malpractices that may be made in the operation of your project.
00:23:15
Emmanuel BACRY: This is something that's important and in particular, you need to provide a data security audit on the computers that will have access, as part of your project, to the Health Data Hub's secure platform. Some people may underestimate this step. But we can help you with that too. We can provide you with the names of auditors who can support you in this phase, if you're not able to do the audit yourself. But it's really very important.
00:23:43
Emmanuel BACRY: But my advice above all is to really identify which bases are needed to answer the questions you're asking and what is the scope of these bases you need. If you need information from the historical SNDS of the Medico-Administrative database, don't say in your protocol that you want all the SNDS. Of course, you have to show that you know the SNDS well, and you say I'm going to need so many people, people who are toped with such and such a signal, with such and such a variable and so on.
00:24:14
Emmanuel BACRY: And I need those variables there, that table and so on. If you show a very good knowledge of the databases you want to access, you increase your chances of favorable opinions at CSRS. And, of course, we also provide a lot of information on the historical SNDS databases. But on the other databases in the catalog, you'll find a lot of metadata information that will help you to understand exactly what information is in these databases to guide you in your protocol.
00:24:42
Emmanuel BACRY: Of course, in the secure spaces to which project developers will have access, the data that is made available is data that is not directly identifying. It's pseudonymous data. So, you won't have the name of the person or the social security number. The Health Data Hub is strictly forbidden to host directly identifying data on its platform. The process from project submission to validation and actual operation of the project on the Health Data Hub platform.
00:25:14
Emmanuel BACRY: In this project, you're going to need human resources with very different specialties. You're going to need people who really understand how the protocol should be written. Then, to support you in the security audit. What does it mean to be a data controller? In fact, there's a whole range of human resources that you need to plan for and not overlook. We've seen a lot of projects be extremely slowed down, or even not come to fruition at all, due to a lack of good human resources.
00:25:52
Lionel Reichardt: Our episode is coming to an end. We thank our two guests for their availability, and you for listening. Don't hesitate to subscribe to the podcast on the listening platforms. We look forward to seeing you very soon for a new episode of "One hundred days to success".
00:26:10
Voiceover: Those who are making e-health today and tomorrow are on the G_NIUS podcast and all the solutions to succeed on gnius.esante.gouv.fr
Description
With Alexandre VAINCHTOCK (HEVA) and Emmanuel BACRY (CNRS / Health Data Hub).
For this eighth episode, "100 Days to Success" focuses on data processing in healthcare research and innovation.
With the testimony of Alexandre VAINCHTOCK, pharmacist and co-founder of HEVA, a healthcare research and consulting company. He is also a Qualified Person at the General Assembly of the "Plateforme des données de santé" public interest group, better known as the Health Data Hub.
.We also welcome Emmanuel BACRY, Research Director at CNRS and Scientific Director of the Health Data Hub.