#28 - Innovating in healthcare with Data Challenges
Innovation
Episode duration 00:14
With Dr Frédéric Staroz, anatomopathologist and President of the Conseil National Professionnel des Pathologistes.
00:00:00
G_NIUS: 100 days to success. This is the podcast from G_NIUS, the Guichet national de l'innovation et des usages en e-santé. Around Lionel Reichardt, meet e-health innovators and key experts to help you succeed in your projects.
00:00:20
Lionel Reichardt: Hello everyone. You're listening to 100 Days to Success, the podcast aimed at innovators and entrepreneurs in digital healthcare, but also anyone curious about this field. This podcast is produced by G_NIUS, the Guichet national de l'innovation et des usages en e-santé. For this episode devoted to the Data Challenge and its place in healthcare innovation, I'm delighted to welcome Dr. Fréderic Staroz, anatomopathologist associated with Ouest Pathologie, member of the Société Française de Pathologie and head of the VisioMel project. Docteur Frédéric Staroz, bonjour.
00:01:04
Dr Frédéric Staroz: Hello Lionel.
00:01:05
Lionel Reichardt: First of all Dr. Staroz, could you quickly remind us what anatomopathology is and what place data has in this specialty?
00:01:14
Dr Frédéric Staroz: Anatomo-cyto-pathology is a medical specialty that can be known either as anatomo-cyto-pathology, pathologist or more colloquially an apathist. The aim of this specialty is to make medical diagnoses based on biopsy or surgical specimens. It's also well known for cervical cancer screening, notably through the examination of cervical smears. We play a major role in oncology, in all types of screening. Breast cancer, colon cancer, but also inflammatory pathologies. Beyond this diagnostic role, we are increasingly involved in defining prognostic criteria to determine whether diseases are serious for a given diagnosis, and also in what we call theranostics. In other words, the ability to predict the sensitivity of a given disease to a given treatment. Either through fairly conventional techniques, or through somatic genetic techniques.
00:02:12
Lionel Reichardt: In 2020, the French Society of Pathology organized its first Data Challenge with the Health Data Hub. Can you tell us what a Data challenge is? What did this first Data Challenge with the Data Hub consist of? It brought together, I believe, over 500 participants from all over the world?
00:02:28
Dr Frédéric Staroz: Exactly. It was an initiative that had been taken at the time by the Health Data Hub, in cooperation with the French Society of Pathology, to have the feasibility of a Data Challenge of this type in medicine. The interest for our specialty was to be able to contribute to the development of artificial intelligence algorithms. A Data Challenge is a competition organized for data scientists. The aim of this type of Data Challenge is to build up a bank of data. In this case, for my specialty, it's image data. Digital images, digitized histological images, from which the data scientists have to answer a medical question that forms the basis of the competition. In the end, the best algorithms, i.e. those with the greatest ability to correctly answer the medical question posed, will win the prize, with three winners.
00:03:30
Dr Frédéric Staroz: For this first Data Challenge, we were able to work with a company, in this case American, DrivenData, which specializes in organizing this type of competition. Either in the medical field, or in other types of fields. The advantage of working with this company is that it has a great deal of experience in this field, and therefore a very extensive network of competitors, as well as the ability to reach out to academics, entrepreneurs, industrialists and even isolated data scientists. As a result, we've been able to reach a huge number of people, basically the entire international data scientist community, which is of course very interesting. The aim is really to help produce artificial intelligence algorithms capable of helping to resolve a medical question, whether diagnostic or otherwise. There is an obvious medical interest, and also an interest for competitors in having this medical database, which is generally quite difficult to obtain, which is controlled data and quality data.
00:04:35
Lionel Reichardt: If I'm not mistaken, you've collected about 5,000 biopsy slides and you were telling me that a biopsy slide is a gigabyte of data. Is that right?
00:04:46
Dr Frédéric Staroz: Yes. Because initially, these are slides like we used to make when we were in high school or college, which are samples that are generally quite small in size, but they are then digitized at 400 times magnification. To give you an idea, a sample of just one centimeter by one centimeter, when scanned at 400 times magnification, is equivalent to scanning an image four meters by four meters. Obviously, that's a huge amount of data, especially as our samples are not in black and white, but in color. We add to the digitization of an image. I'm trying to compare this with the digitization of radiology images, which are often in black and white. We also add color. That's a lot of data. In the Data Challenge we're currently working on for melanoma, we'll be averaging around one gigabyte per slide. Some go up to four gigabytes, so that's a huge amount of data for a single sample.
00:05:53
Lionel Reichardt: We'll talk about the VisioMel project later. With this first Data Challenge, what have you learned? What are the benefits for the various parties involved? The Health Data Hub, doctors, patients, manufacturers, data scientists?
00:06:07
Dr Frédéric Staroz: I'll start with the Health Data Hub. Already, they've shown that it's a format that interests both the medical community and the entrepreneurial, data scientist community, since participation has been high. As far as the medical community is concerned, and pathologists in particular, we're quite happy to be part of a learned society. In other words, our aim is to try and improve, or at any rate to contribute to medical innovation, to be able to help develop algorithms that will then help us in our day-to-day practice. Knowing that we have a shortage of pathologists and that the work of pathologists is becoming increasingly complex. We're looking forward to the arrival of artificial intelligence, which will help us meet the challenges we'll be facing in the years ahead.
00:06:58
Dr Frédéric Staroz: Of course, for entrepreneurs, it also gives them access to medical data that is really very complicated to obtain under the RGPD, in the protection of health data. Medical data that is of high quality, with an interest afterwards for the patient. There are possible industrial developments. Incidentally, the company that won this Data Challenge is a commercial company. It's an entrepreneur who had already developed a number of algorithms in the field of pathology and should be able, with the algorithm it has developed, to develop or potentially market an AI tool in the field of cervical cancer. Which was the subject of this first Data Challenge.
00:07:43
Lionel Reichardt: I think there have been several dozen algorithms developed. The principle of a Data Challenge is often to keep these algorithms open source. Isn't that right?
00:07:51
Dr Frédéric Staroz: Effectively, that's kind of the idea. That is, on the one hand, the data that are gathered by the medical community will remain open source. These are academic projects. On the other hand, we make this high-quality database available free of charge to entrepreneurs. In return, we ask that the algorithm to be developed should also be open source, in the knowledge that subsequent developments or improvements could be kept "more secret", with a slightly more commercial aim. This is indeed the spirit of these Data Challenges. For the first one, we left the choice to the entrepreneurs, i.e. in all cases to the winners. Normally, in the Data Challenges, the three winners receive a prize. In this first Data Challenge, winners who refused the prize could keep their algorithms and not release them as open source. For the next Data Challenge, we're going to systematically ask for these algorithms to be open source, as this seems logical enough to us. We're making the effort and contributing to making high-quality data available free of charge. The idea is to encourage emulation and the dissemination of knowledge for the whole community, both medical and scientific, bearing in mind that subsequent commercial developments are obviously possible.
00:09:14
Lionel Reichardt: You mentioned it, a second Data Challenge is underway and has just been launched for the year 2022-2023. It's called VisioMel. When are you expecting it, and what's it going to be like?
00:09:27
Dr Frédéric Staroz: This Data Challenge is a bit special because, unlike the first Data Challenge, it brings together several learned societies and several specialties. This was a request from HDH for this second Data Challenge session. VisioMel is a consortium, i.e. an association of associations that brings together the French Society of Pathology, which was already involved in the first Data Challenge, the French Society of Dermatology, which is the major learned society of dermatologists, and the National Professional Council of Pathologists, which is an association of medical pathologists. The aim of this Data Challenge is to provide data scientists with both clinical and histological data on melanomas diagnosed at a localized stage. Melanomas are skin cancers whose prognosis can be quite poor, especially if they undergo metastatic evolution.
00:10:35
Dr Frédéric Staroz: The fact of knowing, when melanoma is diagnosed at a localized stage, whether the disease will recur or have a metastatic relapse is important, insofar as if this prediction is good, we could potentially offer these patients so-called neoadjuvant treatments. In other words, they will help prevent recurrence. The aim of this Data Challenge is to see if an algorithm is capable of predicting this recurrence more effectively than current methods, with the potential aim of offering these patients preventive treatment. This is of major medical interest. If it actually works, it's something that can really help save lives.
00:11:22
Lionel Reichardt: To any listeners who might be interested in this Data Challenge, I invite them to type in VisioMel in their favorite search engine and they'll find all the elements on the various websites of learned societies or the Health Data Hub to be able to participate in and contribute to this Data Challenge and find out about the various dates. In conclusion, Dr. Staroz, what advice can you give to entrepreneurs and healthcare innovators who would like to take part in a Data Challenge?
00:11:49
Dr Frédéric Staroz: Somewhere, the answer is contained in the question. We can only advise them to participate. Here, the VisioMel consortium brings together both clinical and therefore histological imaging data from 3,000 patients, which are going to be extremely secure, totally anonymized, with of course totally compliant and ethical management of patient data with the search for opposition. This is extremely valuable material, which can be fully exploited, with a medical issue of real interest on top of it. I think that for an entrepreneur, the fact of having this type of data available free of charge has a very clear interest insofar as it is difficult to predict in advance whether or not an algorithm will be able to answer the question posed. In the final analysis, this enables data scientists, whether they work in isolation, in universities or in start-ups or more developed industries, to find out at a lower cost whether artificial intelligence is indeed capable of answering this type of question. If the answer is positive, this will enable them to continue working either with VisioMel or with other partners to develop an algorithm that will have obvious commercial benefits and an equally obvious medical interest.
00:13:23
Lionel Reichardt: Doctor Frédéric Staroz, thank you very much. Our episode is coming to an end. Thank you for listening to us. We thank our guest again for his availability. Don't hesitate to subscribe to the podcast on the listening platforms. We look forward to seeing you soon for a new episode of 100 Days to Success.
00:13:44
G_NIUS: Those who are making e-health today and tomorrow are on the G-NIUS podcast and all the solutions to succeed are on gnius.esante.gouv.fr
The Data Challenges are data science competitions that aim to answer specific medical problems using the analysis of the data made available.
What are the benefits of these Data Challenges for the healthcare ecosystem and entrepreneurs? How can healthcare innovators take part in these events? Answers with Frédéric Staroz, who also presents the Data Challenge organized by the VisioMel consortium around melanoma prognostic factors.