Question: Somatic allele frequency from TCGA in non-coding DNA
0
gravatar for sacha
2.9 years ago by
sacha1.6k
France
sacha1.6k wrote:

I am looking for data which contains somatic allele frequency obtains from many tumor.

For exemple, something like:
chromosom pos ref alt AF

using TCGA website, it seems they only provide data per sample.  Any Idea where I can find this data ?

#EDIT

I want to get data for non-coding dna, from any location in the genom

allele tcga somatic vcf • 1.4k views
ADD COMMENTlink modified 2.9 years ago by Amitm1.6k • written 2.9 years ago by sacha1.6k
3
gravatar for Amitm
2.9 years ago by
Amitm1.6k
UK
Amitm1.6k wrote:

hi,

Do you mean what is the mut. freq. for a gene X across all samples studied in Cancer X? This is available in cBioPortal. Just select your interest of Cancer, put in your gene name or alternatively click the Summary tab and get mut. freq. for top mutated genes.

If you are looking for mutated allele freq. i.e. in a particular sample how many reads supported the Ref. and Alt. allele., this too is available from cBioPortal. When you input your gene X and get a multi-tabbed page, go to the "Mutation" tab and there choose the appropriate col. from the "Show/ Hide Col" menu.

Here is an e.g. gene in a cancer type -

There are APIs available to access this data programmatically (though I have not used them). But if your query size is small, the website is good enough.

ADD COMMENTlink written 2.9 years ago by Amitm1.6k

Thanks !
It seems it's exactly what I need !

ADD REPLYlink written 2.9 years ago by sacha1.6k

And is there a way to download all data for the complete human genom ?

Because it seems, I can download data only for defined genes. What about non coding dna ?

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by sacha1.6k

Ok, cBioPortal is greate! But I want the same data for non-coding dna! That's mean, I give a location and I get the AF in the same way that for coding genes.

ADD REPLYlink written 2.9 years ago by sacha1.6k

I have  posted answer in case you want (Somatic) mut. that are affecting non-protein coding region of the transcriptome.

ADD REPLYlink written 2.9 years ago by Amitm1.6k
0
gravatar for Chris Miller
2.9 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

The MAF files from TCGA only contain coding mutations. If you're looking for the rest of the genome, you'll need to get the VCFs, which are in most cases, protected data (for which you'll have to apply for access). That's because they're less stringently filtered and may contain germline mutations, bringing patient privacy in as a concern. 

Once you get access, you'll find lots of non-coding mutations from the wingspans of the exomes, as well as some WGS cases. This is not universally true, but for AML, you can find all validated somatic mutations (regardless of coding status) in the supplementary tables from the publication. That's available here: https://tcga-data.nci.nih.gov/docs/publications/laml_2013/

ADD COMMENTlink written 2.9 years ago by Chris Miller20k

Thanks,

I have already a authorized key. Let me look ! Thanks

ADD REPLYlink written 2.9 years ago by sacha1.6k
0
gravatar for Amitm
2.9 years ago by
Amitm1.6k
UK
Amitm1.6k wrote:

hi Sacha,

You can download all Somatic mutations from the TCGA portal as pointed out by Chris Miller.

I would want to contradict what Chris said though. The TCGA MAF, as far as I have understood, contain all Somatic mut. And that would include any somatic mut. which were non-coding as well. What you wont get without a licence are the Germline calls.

As an e.g. here is a screenshot of the types of mut. present in the MAF file for melanoma (SKCM) samples -

So, **ALL** somatic, including non-coding as well. Though I have not worked with the AML cohort, I presume that this (coding as well as non-coding somatic mut. present) must be the situation for most of the MAFs available from TCGA, if not all.

ADD COMMENTlink written 2.9 years ago by Amitm1.6k

hi Sacha and Chris,

I think I maybe mistaken. Sacha, if you are looking for mut. in non-transcribed regions (non-coding DNA) in the TCGA data, then you will have to look into WGS data from TCGA. Most samples though have their exome sequenced and many those who have WGS are actually low-pass seq. for calling Copy number. There could be samples though in TCGA which have been WGS in depth enough to call Somatic mut. Sorry, I am not aware of any such samples in TCGA apart from what Chris has suggested above in the Supple. data.

ADD REPLYlink written 2.9 years ago by Amitm1.6k

In your screenshot, it seems you only have coding gene region ( exon + intron). I think those data comes from exome sequencing and not from whole genome sequencing.  Or maybe I am wrong, so please attach a maf file wich contains non-coding DNA ( like lncRNA).

 

ADD REPLYlink written 2.9 years ago by sacha1.6k
1

hi,

You are right. Most data, as I already mentioned are exome sequencing and the WGS are mostly low pass. That said, my experience is with SKCM samples so I would not be sure of the status of other cancer cohorts hosted on TCGA. To clarify again, the MAF that you get (from WES) would contain any Somatic mut. that was within the coordinates of the target capture method used for making the WES samples' seq. lib. prep.

SKCM, at present on TCGA, doesn't have mut. calls from WGS. Only WES.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Amitm1.6k
1

That may be true some places, but I promise that for many cancer types, only protein-coding mutations were reported in the MAF files, and those non-coding mutations that fell within the wingspan of exome probes were excluded from reporting in the MAF files. The VCFs should provide a more comprehensive list, but are often kind of a mess, due to the strange requirements placed upon the centers by the consortium. (reporting all calls, even those filtered out, using separate columns for multiple callers, etc)

ADD REPLYlink written 2.9 years ago by Chris Miller20k

Thanks Chris for the insight. I have looked into only SKCM MAF and it had everything somatic (at least I hope so as I see silent & RNA mut. as well).

Its a pity though that there are licensing requirements and over that there are strict rules that need to be complied for the server that would hold licensed data.

ADD REPLYlink written 2.9 years ago by Amitm1.6k
1

Silent and RNA mutations can both be be considered coding, depending on your precise definition, which is why they're included. It's not licensing that's the issue, it's genetic privacy. Given enough of the germline mutations that slip through the somatic filters, people could be identifiable, and that raises all sorts of potential issues. I'm generally a proponent of wide sharing of genetic data, but these people (and their families) have not consented to having their germline mutations shared, in most cases.

ADD REPLYlink written 2.9 years ago by Chris Miller20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour