How to identify non-cancerous files in TCGA database and how to generate SFS
Entering edit mode
4.4 years ago

I am trying to generate a Site Frequency Spectrum from the TCGA database. However I am having some trouble; according to my algorithm, every single allele from the cancerous tissue matches that of the non-cancerous tissue (only looking at homozygous portions of data). I think that this is because there is an error in my method.

In my current implementation , I am assuming that the VCF files from the TCGA database are corresponding to only non-cancerous tissue. Here is the specific VCF file I am using . In particular I am trying to generate a site frequency spectrum for case 001cef41-ff86-4d3f-a140-a647ac4b10a1 in the TCGA breast cancer database.

I was working on this project a while ago and have since forgotten why I thought all of the VCF files were referring to non-cancerous tissue. Furthermore I do not know how to verify that this is or is not this is the case.

My questions are :

  1. Do the VCF files in this database correspond to non-cancerous tissue?
  2. How can I identify what files are corresponding to non-cancerous tissues? I am currently only able to do this for the BAMs. Info for BAMs
  3. If I am wrong about VCF files being non-cancerous, how should I proceed to generate the site frequency spectrum?
TCGA SFS VCF • 960 views

Login before adding your answer.

Traffic: 3250 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6