Question: Extracting mutation data from TCGA
0
gravatar for leonel.cardozo20
6 months ago by
leonel.cardozo200 wrote:

Hello everyone

I should start by saying I'm a complete undergrad beginner in Bioinformatics, just recently learning key basic concepts of programming and data structure. I'm trying to extract data from TCGA in order to compare mutational status in two genes across a cohort of cancer patients. Our hypothesis in the lab is that there will be a correlation between our genes of interest, i.e., mutations in both genes would occur together to promote tumorigenesis and a more aggressive phenotype.

We have reached a frustrating halt. I really want to get through this. We basically want to extract mutation data for two genes in the same cohort. I'm trying to work with Firebrowse and FirebrowseR, which seem like the most promising tools for this, but to no avail. Could someone point me in the right direction? Any advice will be very appreciated. Thank you!

gene • 508 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe49k • written 6 months ago by leonel.cardozo200
1
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

Hey, well, what can I say - I understand your frustration working with this data. Firstly, I should point out that Firebrowse and other web-sites / -hosts are third parties who have re-processed TCGA data and made it available to the public. The primary source of the open access TCGA data is the Genomic Data Commons (GDC), specifically the data portal: https://portal.gdc.cancer.gov/repository

From the GDC, you can download what are called Mutation Annotation Format (MAF - not to be confused with minor allele frequency) files. These contain somatic mutation calls. HERE is a configured search for you for all MAF files for all cancers for mutations that were called with Somatic Sniper. You can quite easily obtain these and look up your genes of interest.

You can download the files individually or else download a file manifest and use the Data Transfer Tool, which is usually used for a large number of files.

Note that, in the MAF files, there may be a column called FILTER which may contain a value called 'panel_of_normals'. If a somatic mutation has this filter flag, then it means that it was found in a cohort of healthy individuals not connected to the TCGA project. Thus, you can eliminate these.

Kevin

ADD COMMENTlink written 6 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour