How to identify non-cancerous files in TCGA database and how to generate SFS
0
1
Entering edit mode
6.8 years ago

I am trying to generate a Site Frequency Spectrum from the TCGA database. However I am having some trouble; according to my algorithm, every single allele from the cancerous tissue matches that of the non-cancerous tissue (only looking at homozygous portions of data). I think that this is because there is an error in my method.

In my current implementation , I am assuming that the VCF files from the TCGA database are corresponding to only non-cancerous tissue. Here is the specific VCF file I am using . In particular I am trying to generate a site frequency spectrum for case 001cef41-ff86-4d3f-a140-a647ac4b10a1 in the TCGA breast cancer database.

I was working on this project a while ago and have since forgotten why I thought all of the VCF files were referring to non-cancerous tissue. Furthermore I do not know how to verify that this is or is not this is the case.

My questions are :

  1. Do the VCF files in this database correspond to non-cancerous tissue?
  2. How can I identify what files are corresponding to non-cancerous tissues? I am currently only able to do this for the BAMs. Info for BAMs
  3. If I am wrong about VCF files being non-cancerous, how should I proceed to generate the site frequency spectrum?
TCGA SFS VCF • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6