Question: Extracting mutation data from TCGA
gravatar for leonel.cardozo20
16 months ago by
leonel.cardozo200 wrote:

Hello everyone

I should start by saying I'm a complete undergrad beginner in Bioinformatics, just recently learning key basic concepts of programming and data structure. I'm trying to extract data from TCGA in order to compare mutational status in two genes across a cohort of cancer patients. Our hypothesis in the lab is that there will be a correlation between our genes of interest, i.e., mutations in both genes would occur together to promote tumorigenesis and a more aggressive phenotype.

We have reached a frustrating halt. I really want to get through this. We basically want to extract mutation data for two genes in the same cohort. I'm trying to work with Firebrowse and FirebrowseR, which seem like the most promising tools for this, but to no avail. Could someone point me in the right direction? Any advice will be very appreciated. Thank you!

gene • 1.6k views
ADD COMMENTlink modified 16 months ago by Kevin Blighe63k • written 16 months ago by leonel.cardozo200
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Hey, well, what can I say - I understand your frustration working with this data. Firstly, I should point out that Firebrowse and other web-sites / -hosts are third parties who have re-processed TCGA data and made it available to the public. The primary source of the open access TCGA data is the Genomic Data Commons (GDC), specifically the data portal:

From the GDC, you can download what are called Mutation Annotation Format (MAF - not to be confused with minor allele frequency) files. These contain somatic mutation calls. HERE is a configured search for you for all MAF files for all cancers for mutations that were called with Somatic Sniper. You can quite easily obtain these and look up your genes of interest.

You can download the files individually or else download a file manifest and use the Data Transfer Tool, which is usually used for a large number of files.

Note that, in the MAF files, there may be a column called FILTER which may contain a value called 'panel_of_normals'. If a somatic mutation has this filter flag, then it means that it was found in a cohort of healthy individuals not connected to the TCGA project. Thus, you can eliminate these.


ADD COMMENTlink written 16 months ago by Kevin Blighe63k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour