I need to build pool of normals/normal database for my analysis. I read that normal (blood) samples prepared in the same manner as tumor samples (capture kits, sequencing technology) must only be considered . I do not have access to normal (blood) exome data enriched by the capture kit used for tumor samples. Is there any alternative? How should I work further?
Hi, this is covered in our FAQ section. Unfortunately, no, you need a few normal BAMs from the same capture kit for coverage normalization. There are tools like CNVkit that work reasonably well without, but not good enough for allele specific copy number inference. If this is a commercial capture kit, then you might be able to find some in the public domain or from the vendor. But yes, like Kevin said, this is usually protected data.
What I have done in the past is to use CNVkit to find tumors without major copy number events and then used these as normals. But this only works in some cancer types.
And of course, adjacent normals work too (for the unlikely case you have those but not blood).
Which cancer is it? TCGA has normal mutation data for all of the primary cancers; however, this data is protected - do you have approved access to the controlled TCGA data?
You have not clarified whether or not you require VCFs, BAMs, or FASTQs
Hey Kevin, I need BAM files of exome data and no. I do not have access to TCGA data. I have looked for it in SRA, BAM files have controlled access.
Hey, I'm able to download 1000 Genomes Phase III data. Is it possible for me to know using which capture kit the data has been prepared?
I have another doubt. Is normal BAM from blood sample required for all kinds of analysis such as coverage normalisation, variant calling etc. Because I need normal BAM files to calculate microsatellite instability (MSI). Is it okay if I download BAM files available on 1000 genomes?