This entry was inspired by a twitter conversation.
It started from a post about a paper by @cookdeegan and @MishaAngrist on "Distributing the future: The weak justifications for keeping human genomic databases secret and the challenges and opportunities in reverse engineering them". Nice paper BTW.
With a somewhat related question, @drgitlin asked if the HeLa genome should not have been moved to closed access. Angrist and Cook-Deegan don't specifically address the HeLa controversy in their paper, instead focusing on the example of BRCA1/2 breast cancer predisposition mutations and the recent Myriad supreme court decision. However, they generally argue that "data must be pooled to be useful, and open, public databases must house clinically relevant information". They further argue that data-hoarding and weak infrastructure and sharing practices will slow the development of the tools necessary for analysis of genomic variants. I tend to agree and espoused a similar philosophy in a recent opinion piece on the need for a knowledge commons to facilitate collaborative contributions and open discussion of clinical decision-making based on genomic events in cancer.
Taking a bit of a sideways leap, we can ask the question of whether the recently published HeLa genome should be kept open or behind a dbgap firewall where it currently resides. A recent paper from dbgap reported a surprisingly low number of investigators (~2000) have obtained data access over the last six years. Anecdotal evidence here and on twitter certainly suggests that many find leaping the dbgap firewall a tedious process. So, this is a legitimate question. As an aside, dbgap may be getting the message, as evidenced by their recent move to create a convenient collection of dbgap studies available for General Research Use, that can be accessed with just one application.
However, as of now, accessing the HeLa genome data (at least for NIH-funded researchers) requires application to dbgap for permission, approval by a committee that includes at least two members of the Lacks family, acknowledgement to the Lacks family in any resulting publications, and agreement to deposit all future genome data in dbGaP. @dgmacarthur argues that the Lack family has every right to be informed about uses of the cell line derived from their ancestor and that providing a short summary of your intended use in not too great of a burden for researchers. Playing the devil's advocate, and somewhat tongue-in-cheek, I asked whether researchers using the Cancer Cell Line Encyclopedia (CCLE) should have to write 1000 summaries to 1000 families. @dgmacarthur and others argued that the HeLa/Lacks situation is a special case because she/they were not consented and not kept anonymous. The CCLE is a project that profiled the genomes and drug response of ~1000 cell lines.
Is it true that the HeLa situation is unique compared to these 1000 cell lines? I would argue that HeLa is only unique in that it is the only cancer cell line to have had a popular science book written about it and thus received a great deal of media attention. If one were to dig into other cell lines commonly used in the CCLE and in thousands of labs around the world, I suspect you could find similar situations. For example, MCF-7 cells, probably the most popular breast cancer cell line, are not anonymous. They are known to have been developed in 1970 from Frances Mallon. She is said to have consented to the procedure but almost certainly this did not include the possibility of genome sequencing and would not have met the standard for consent typically required today. What if relatives of Frances Mallon become concerned about MCF-7 genomic data published in online repositories? Other cell lines such as BT-20, the first breast cancer cell line established, are even harder to clarify. BT-20 was established in 1958. I cannot find the full text of the original paper online or any mention of the consent status. Were the 11-year old girl who provided Saos-2 cells or the 14 year old boy who provide JURKAT cells (or their parents) consented back in the 70s when these were developed?
In fact, the tissues from which cell lines are derived (e.g., resected tumor specimens) have been (and still are) generally considered discarded tissue and the patient has no special ownership of them or commercial rights to their derivative products. Given that long-prevailing view I think it likely that many/most cell lines developed in previous decades came from patients who probably formally consented to their procedures and maybe even general research but not to their tissue being used to develop cell lines and some day be sequenced. Determining the details of consent for these lines is no small task, requiring extensive sleuthing. Unless someone writes a book about each cell line (as was done for HeLa) we will probably never know the full story and the details will remain buried in lab notebooks of the pre-digital era. Where consent and the use of these cell lines seems to become a particularly sticky issue is when genomic sequence enters the picture. A search of GEO/SRA will reveal that large number of these cell lines have indeed been sequenced at the exome, genome, and transcriptome level including our own release of the transcriptomes (n=56) and exomes (n=75) of breast cancer cell lines. CCLE makes a much larger number of similar data available through cghub (without dbgap restrictions?).
Thus, it is possible that increasing numbers of individuals or families could emerge with questions about how their tissues have been used and concerns about the privacy of their genetic material. Are we prepared to form a committee for each of them? Maybe we will need to as a public relations exercise to maintain trust of the public for the scientific community. Do we impose restrictions on these legacy lines that would impede the scientific progress that they have undoubtedly contributed to? I submit that it is simply not reasonable to hold the products of science from the 50s, 60s, and 70s to the same ethical standards as today. These lines should in effect be "grandfathered in". Going forward, with what we know now, certainly patients should be fully consented and in as forward-thinking a manner as possible. This includes a realistic explanation of the chances that, even with our best efforts and intentions, we will be able to maintain their genetic privacy. I think sample consent language provided by the NHGRI actually does a reasonably good job at this and I highly applaud their efforts.
The crux of the problem is that expecting to keep your genetic code private is at some point going to be about as feasible as expecting your fingerprints, license plate or facial features to remain private. Witness the collection of DNA fingerprints in National DNA databases (9 million individuals profiled in USA), vehicle locations (>700 million license plates scanned in USA), fingerprints (104 million is USA), and many other identifying features. Collecting a few cells, sequencing them and linking them to you as a person will become trivially easy. Almost as easy as tracking your online activities. But, with even fewer technical safeguards. Unless you live in a bubble, you leave a trail of cells behind you, everywhere you go. The cat is unfortunately out of the bag. I submit that our only practical solution is to debate as a society what we consider to be abuse of such data and prevent such abuses by strengthening and expanding legislation like GINA.
The ability to sequence and interpret our genomes is such a fundamental discovery and its implications for human society so significant that it may even require constitutional amendments (as the scary possibilities of GATTACA move from scifi to technically feasible). The benefits of strict policies governing sequence data protection in the research setting should be balanced against the costs of the slowed research into medical problems that they produce. Especially when these policies may offer only the illusion of protection against abuse of genetic information. The sooner we recognize this reality the sooner we can start addressing this challenge in a real way.