Question: Extracting VCF of all variants in TCGA exomes from all cancer types
0
gravatar for j.lunger18
10 months ago by
j.lunger1810
j.lunger1810 wrote:

Hi. Ideally, I would like to get a single VCF file from all the exomes sequences that TCGA has from all cancer types. Even more ideally, I would do this for only a certain region in the genome. Is there any way to do this? I have GDC-client downloaded and loaded in the command line at the moment, but can only seem to find UUIDs for individual cancer types.

tcga vcf • 216 views
ADD COMMENTlink modified 10 months ago by Kevin Blighe65k • written 10 months ago by j.lunger1810

If you are using R, you could consider looking at the GenomicDataCommons package to help facilitate finding and downloading datasets of interest. However, Kevin is correct that the MAFs are probably what you want. Note the the MAFs are filtered from the original variant files which are available only after obtaining dbGaP access.

ADD REPLYlink written 10 months ago by Sean Davis26k
1
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

TCGA VCF files are not available as open access - only MAF (mutation annotation format) files are available, and these can be downloaded from the GDC (Genomic Data Commons) Data Portal.

You can search for functions online about how to convert MAF to VCF, if that is definitely what you need.

If you keep everything as MAF, which is essentially tab-delimited format, then you can simply use shell commands to merge everything together. If you convert the data to multiple VCFs, then you can use BCFtools to merge them.

Kevin

ADD COMMENTlink written 10 months ago by Kevin Blighe65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1461 users visited in the last hour