Viral sequence in GDC data
1
0
Entering edit mode
5.1 years ago

Hi,

I was reading GDC analysis protocol and stopped on this section

....... In order to increase the accuracy and efficacy of alignment the GDC has added multiple decoy sequences to the GRCh38 reference genome (GCA_000786075.2). Sequences from a variety of viruses were also included to provide information on the presence of oncoviruses.......

from : https://gdc.nci.nih.gov/about-data/data-harmonization-and-generation/genomic-data-harmonization/genomic-data-alignment

Anyone knows where I can find informations about samples containing such traces of viruses ?

Thanks

virus GDC • 1.6k views
ADD COMMENT
1
Entering edit mode
5.0 years ago
mitch ▴ 10

You probably worked this out already, but for others who may find this post:

The viral alignments are included in the BAMs in the GDC Data Portal (but likely not in the Legacy Archive?).

The viruses included are here: https://gdc.cancer.gov/files/public/file/GRCh83.d1.vd1_virus_decoy.txt More detail on the decoys and viral sequences is here: https://gdc.cancer.gov/download-gdc-reference-files

To access the BAM files you'll need to login or use an authentication token because the BAMs are controlled-access. I'm not sure if quantifications of the viral sequences are available in the public datasets.

ADD COMMENT

Login before adding your answer.

Traffic: 1339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6