Precision medicine requires big data. In order to improve treatment for people with cancer or to understand rare diseases, scientists and clinicians, as well as AI technologies need access to larger health research datasets that span diverse populations and a wide range of conditions. For AI, more data means better understanding of diseases, which will lead to more accurate diagnosis and treatment. At the same time, each hospital will only see a relatively small number of people with a disease, and even across the province we only have access to a small portion of the total data available globally. . To create the large-scale datasets needed to advance precision medicine, sharing data across the country and around the world is essential.
The Canadian Distributed Infrastructure for Genomics (CanDIG), featured recently in a special issue of Cell genomics dedicated to data sharing, is Canada’s solution to enable data sharing across the country (and connect our data to datasets around the world). Led from the Toronto University Health Network with sites at McGill University in Montreal and the BC Genome Science Center, CanDIG is a collaboration of computer scientists, AI specialists, clinicians and geneticists working together to enable the studies necessary to address the health challenges facing Canadians.
CanDIG is a pilot project of the Global Alliance for Genomics and Health (GA4GH), an international effort setting standards for genomic and health data aimed at improving interoperability in the genomic landscape around the world. The organization served as the focal point for this month’s special issue of Cell genomics for their work on global genomics and health data sharing efforts. Canada has been a leader in GA4GH, hosting its headquarters, leading several project workflows and implementing many GA4GH standards. CanDIG, as one of the pilot projects, not only implemented the GA4GH standards, but helped inform and create many of them. CanDIG is already helping scientists nationally access large-scale genomic data that was previously siled in individual provinces or hospitals and is starting to link Canada’s genomic datasets to those around the world through collaborations such as EU / Africa / Canada CINECA project.
The CanDIG platform was developed to respond to provincial health care and privacy legislation, creating a federation of datasets, simplifying the challenges of sharing across provincial borders. CanDIG is also a key component of the upcoming Digital Health and Discovery Platform (DHDP), a $ 200 million effort funded in part by the Canadian government, which will support the sharing of genomic data from the Terry Fox Marathon of Hope Cancer Network. Making this data available to researchers is critical to unlocking their discovery potential and enabling better cancer treatments, because the smartest researchers and the most powerful machine learning techniques can’t do anything with data that they cannot find, access or use.
“In institutions like UHN, we are building increasingly sophisticated data resources containing health data from many different sources. The next step is to help researchers turn that data into new knowledge by making it findable, available, and usable in a consistent, organized, and secure manner. way, and allowing it to be combined with similar data sets in other hospitals. CanDIG is an important step in enabling researchers across Canada to access the wealth of data collected and generated across the country.
Dr Michael Brudno, CanDIG Principal Investigator, UHN Chief Data Scientist and Professor of Computer Science at the University of Toronto.
“By participating in the GA4GH community and in international projects such as the CINECA EU / Africa / Canada project, CanDIG begins to link Canadian genomics efforts with those around the world. As health data types get richer and volumes grow, we need to make sure our datasets are findable and useful; Canada is a world leader in this area.
Dr Guillaume Bourque, CanDIG co-leader, professor of molecular genetics at McGill University and director of the Canadian Center for Computational Genomics (C3G).
“Access to whole genome data has been essential in understanding the spectrum of mutations that occur in cancer. CanDIG and Terry Fox’s Digital Health and Discovery Platforms will help the data collected by the Marathon of Hope Cancer Centers Network be studied by as many approved people. researchers as possible.
Dr Steve Jones, CanDIG Co-Lead, Chief Bioinformatics Officer and Co-Director at BC Michael Smith Genome Sciences Center