Question

Publicly available somatic variant calls for kidney cancer using WGS

2

Entering edit mode

9.0 years ago

tralynca ▴ 50

Good day,

Does anyone know where I can find published somatic mutation calls for kidney cancer by using whole genome sequencing and NOT whole exome sequencing. I need it for the non-coding portion of the genome. Preferably not TCGA because they have controlled access data and the somatic variants are mixed with germline mutations.

Thank you in advance,

Tracey

somatic-variants WGS kidney-cancer • 3.0k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by tralynca ▴ 50

0

Entering edit mode

Just to clarify for readers down the road, the TCGA somatic variants are not controlled-access. The BAM files, of course, are controlled-access, as will be the case for pretty much all human data. ALL studies using NGS will have somatic variants that are "contaminated" with germline variants, unfortunately; the extent will vary, of course, based on technical details.

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by Sean Davis 26k

0

Entering edit mode

Hi Sean,

Maybe I misunderstood, but the Data Levels and Data Types tab shows that the mutation files (whole genome and whole exome data) that are vcf and maf files (Level 2 data) are Controlled Access data (https://tcga-data.nci.nih.gov/tcga/tcgaDataType.jsp).

screenshot

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by tralynca ▴ 50

0

Entering edit mode

You did ask about whole genome somatic variants. The exome somatic variants are available as somatic MAF files (but not the genomic somatic variants). That said, it is relatively straightforward to get access to the controlled-access data, so that really shouldn't stop your analysis.

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by Sean Davis 26k

0

Entering edit mode

Thanks for the feedback Sean. My supervisor is processing the request for the data. I was just hoping there was something else out there.

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by tralynca ▴ 50

Ram · Answer 1 · 2015-05-04

1

Entering edit mode

9.0 years ago

Manu Prestat 4.1k

Hi Tracey, sorry for being very pessimistic. I think it would be difficult (if not impossible) as the recommended depth of coverage is around 500x to be able to make calls for detecting low allele frequencies as it is often the case for somatic mutations. Thus it is very unlikely that such a dataset where whole genomes were sequenced at this depth for these kinds of tumorous samples can be found nowadays. Let's consider 1000x on average to expect a 500x DC on most part of the genome (which is surely an underestimation of the sequencing effort needed):

Stating that you need to sequence:
1000x 3.4x10^9bp = 3.4x10^12 bp = 3400 Gb
and you have (for instance):
MiSeq output ~ 15Gb max
HiSeq 4000 output ~ 1500Gb max

=> 226 MiSeq runs / sample
=> 3 HiSeq 4000 runs / sample

I can't imagine if you needed a set of several samples (roughly at least 15 = 45 HiSeq 4000 runs) to ensure that you have a significant representation of variant calls to tell it is specific to kidney cancer.

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by Manu Prestat 4.1k

0

Entering edit mode

Thank you for your response Manu. Is that supposed to be 1000X or 100X because most articles state that 30-60X is sufficient for DC of WGS data?

ADD REPLY • link 9.0 years ago by tralynca ▴ 50

1

Entering edit mode

I think Manu is just pointing out that, while 30-60x is what is typically done, for low allele frequency variants, a much higher depth is needed that what is typically done. Studies using 30-60x for somatic variant calling are very likely underpowered to detect somatic variants.

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by Sean Davis 26k

0

Entering edit mode

Makes sense. Thank you again Manu and Sean.

ADD REPLY • link 9.0 years ago by tralynca ▴ 50

Ram · Answer 2 · 2015-05-04

1

Entering edit mode

9.0 years ago

Julian Gehring ▴ 20

The ICGC has two whole-genome sequencing studies for renal cancer and renal cell cancer:

The data repository contains the somatic variants calls (SNVs and InDels, called simple somatic variants by ICGC) for the two studies. You should note the studies may have used different processing and variant calling pipelines. In general, the calls are saved as tab-delimited files, with additional metainformation regarding calling and genomic annotation. If you are only interested in non-coding variants, you can filter for variants with the respective attributes (e.g. those in intergenic regions).

Of course it depends very much on the question you want to address if these two studies are enough, but it should hopefully provide a good basis for your analysis.

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by Julian Gehring ▴ 20

1

Entering edit mode

Hi Julian,

I meant to still get back at you and thank you for your suggestion. I ended up using the ICGC data for my project.

Tracey

ADD REPLY • link 8.8 years ago by tralynca ▴ 50

0

Entering edit mode

I'm having a look at it now. Thank you Julian.

ADD REPLY • link 9.0 years ago by tralynca ▴ 50