Question: How to identify non-cancerous files in TCGA database and how to generate SFS
gravatar for ampearson1729
3.1 years ago by
ampearson17290 wrote:

I am trying to generate a Site Frequency Spectrum from the TCGA database. However I am having some trouble; according to my algorithm, every single allele from the cancerous tissue matches that of the non-cancerous tissue (only looking at homozygous portions of data). I think that this is because there is an error in my method.

In my current implementation , I am assuming that the VCF files from the TCGA database are corresponding to only non-cancerous tissue. Here is the specific VCF file I am using . In particular I am trying to generate a site frequency spectrum for case 001cef41-ff86-4d3f-a140-a647ac4b10a1 in the TCGA breast cancer database.

I was working on this project a while ago and have since forgotten why I thought all of the VCF files were referring to non-cancerous tissue. Furthermore I do not know how to verify that this is or is not this is the case.

My questions are :

  1. Do the VCF files in this database correspond to non-cancerous tissue?
  2. How can I identify what files are corresponding to non-cancerous tissues? I am currently only able to do this for the BAMs. Info for BAMs
  3. If I am wrong about VCF files being non-cancerous, how should I proceed to generate the site frequency spectrum?
sfs tcga vcf • 739 views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by ampearson17290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 702 users visited in the last hour