Somatic allele frequency from TCGA in non-coding DNA
3
0
Entering edit mode
8.7 years ago
sacha ★ 2.4k

I am looking for data which contains somatic allele frequency obtains from many tumors.

For example, something like:

chromosome pos ref alt AF

Using TCGA website, it seems they only provide data per sample. Any Idea where I can find this data?

EDIT:

I want to get data for non-coding DNA, from any location in the genome

vcf somatic tcga allele • 3.7k views
ADD COMMENT
3
Entering edit mode
8.7 years ago
Amitm ★ 2.3k

hi,

Do you mean what is the mut. freq. for a gene X across all samples studied in Cancer X? This is available in cBioPortal. Just select your interest of Cancer, put in your gene name or alternatively click the Summary tab and get mut. freq. for top mutated genes.

If you are looking for mutated allele freq. i.e. in a particular sample how many reads supported the Ref. and Alt. allele., this too is available from cBioPortal. When you input your gene X and get a multi-tabbed page, go to the "Mutation" tab and there choose the appropriate col. from the "Show/ Hide Col" menu.

Here is an e.g. gene in a cancer type -

There are APIs available to access this data programmatically (though I have not used them). But if your query size is small, the website is good enough.

ADD COMMENT
0
Entering edit mode

Thanks !
It seems it's exactly what I need !

ADD REPLY
0
Entering edit mode

And is there a way to download all data for the complete human genom ?

Because it seems, I can download data only for defined genes. What about non coding dna ?

ADD REPLY
0
Entering edit mode

Ok, cBioPortal is greate! But I want the same data for non-coding dna! That's mean, I give a location and I get the AF in the same way that for coding genes.

ADD REPLY
0
Entering edit mode

I have posted answer in case you want (Somatic) mut. that are affecting non-protein coding region of the transcriptome.

ADD REPLY
0
Entering edit mode
8.7 years ago

The MAF files from TCGA only contain coding mutations. If you're looking for the rest of the genome, you'll need to get the VCFs, which are in most cases, protected data (for which you'll have to apply for access). That's because they're less stringently filtered and may contain germline mutations, bringing patient privacy in as a concern.

Once you get access, you'll find lots of non-coding mutations from the wingspans of the exomes, as well as some WGS cases. This is not universally true, but for AML, you can find all validated somatic mutations (regardless of coding status) in the supplementary tables from the publication. That's available here: https://tcga-data.nci.nih.gov/docs/publications/laml_2013/

ADD COMMENT
0
Entering edit mode

Thanks,

I have already a authorized key. Let me look ! Thanks

ADD REPLY
0
Entering edit mode
8.7 years ago
Amitm ★ 2.3k

hi Sacha,

You can download all Somatic mutations from the TCGA portal as pointed out by Chris Miller.

I would want to contradict what Chris said though. The TCGA MAF, as far as I have understood, contain all Somatic mut. And that would include any somatic mut. which were non-coding as well. What you wont get without a licence are the Germline calls.

As an e.g. here is a screenshot of the types of mut. present in the MAF file for melanoma (SKCM) samples -

So, **ALL** somatic, including non-coding as well. Though I have not worked with the AML cohort, I presume that this (coding as well as non-coding somatic mut. present) must be the situation for most of the MAFs available from TCGA, if not all.

ADD COMMENT
0
Entering edit mode

hi Sacha and Chris,

I think I maybe mistaken. Sacha, if you are looking for mut. in non-transcribed regions (non-coding DNA) in the TCGA data, then you will have to look into WGS data from TCGA. Most samples though have their exome sequenced and many those who have WGS are actually low-pass seq. for calling Copy number. There could be samples though in TCGA which have been WGS in depth enough to call Somatic mut. Sorry, I am not aware of any such samples in TCGA apart from what Chris has suggested above in the Supple. data.

ADD REPLY
0
Entering edit mode

In your screenshot, it seems you only have coding gene region ( exon + intron). I think those data comes from exome sequencing and not from whole genome sequencing. Or maybe I am wrong, so please attach a maf file wich contains non-coding DNA ( like lncRNA).

ADD REPLY
1
Entering edit mode

hi,

You are right. Most data, as I already mentioned are exome sequencing and the WGS are mostly low pass. That said, my experience is with SKCM samples so I would not be sure of the status of other cancer cohorts hosted on TCGA. To clarify again, the MAF that you get (from WES) would contain any Somatic mut. that was within the coordinates of the target capture method used for making the WES samples' seq. lib. prep.

SKCM, at present on TCGA, doesn't have mut. calls from WGS. Only WES.

ADD REPLY
1
Entering edit mode

That may be true some places, but I promise that for many cancer types, only protein-coding mutations were reported in the MAF files, and those non-coding mutations that fell within the wingspan of exome probes were excluded from reporting in the MAF files. The VCFs should provide a more comprehensive list, but are often kind of a mess, due to the strange requirements placed upon the centers by the consortium. (reporting all calls, even those filtered out, using separate columns for multiple callers, etc)

ADD REPLY
0
Entering edit mode

Thanks Chris for the insight. I have looked into only SKCM MAF and it had everything somatic (at least I hope so as I see silent & RNA mut. as well).

Its a pity though that there are licensing requirements and over that there are strict rules that need to be complied for the server that would hold licensed data.

ADD REPLY
1
Entering edit mode

Silent and RNA mutations can both be be considered coding, depending on your precise definition, which is why they're included. It's not licensing that's the issue, it's genetic privacy. Given enough of the germline mutations that slip through the somatic filters, people could be identifiable, and that raises all sorts of potential issues. I'm generally a proponent of wide sharing of genetic data, but these people (and their families) have not consented to having their germline mutations shared, in most cases.

ADD REPLY

Login before adding your answer.

Traffic: 1025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6