Question: best way to separate copy number variations from VCF files
soleimani_homa wrote:


I am interested in finding copy number variation in my samples. I have raw VCF files. I have looked at the previous questions, but I have not gotten one clear answer. Is there a walker to find CNV's (duplications or deletions) in GATK from raw VCF files?

WouterDeCoster wrote:

What about using grep? I'd use something like:

cat <(grep '^#' myvariants.vcf) <(grep '<DEL>\|<DUP>' myvariants.vcf) > cnvs.vcf

But I'm not sure how your vcf looks like. The first grep takes the header lines, the second grep searchs for variants containing either the word <del> or the word <dup>.

Since my VCF files are derived from the GATK software, I would prefer to continue the path with the GATK. Do you have any suggestions for separating the CNVs from the VCF file using GATK?

Please do not make the mistake of overcomplicating things. This is a simple pattern-extraction task. Even if you use a GATK filtering tool (if that exists, I don't know) it will do the exact same thing, just wrapped in a GATK_filter_whatever.jar. The suggested solution is perfectly fine.

