Question: Comparing Allelic Frequency differences between vcf files
gravatar for nina.maryn
12 weeks ago by
nina.maryn10 wrote:


This is my first post and my first big bioinformatics project. I'm new to command line and the tools for analyzing vcf files, so please be kind!

I have two VCF files, each representing GATK filtered Indels for RNA-seq in 6 tissues from two divergent species, A & B. They are aligned to their closest domesticated relative, C, whose genome is sequenced. We want to ultimately create "pseudo-reference" genomes for A and B using their RNA-seq data to be able to sequence more closely related species to A & B and align them more accurately in the future. At the current step, we want to filter out ALT indels that have frequency differences <= 0.6 between A & B, so that we only keep differences that are fixed between the species.

How do I compare AF site by site between two VCF files? I want to (1) keep unique indels with an ALT AF >=0.6 in each species and (2) keep shared indels where the ALT AF difference between A & B is >= 0.6. I ultimately need to end up with two vcf files again, for A and B.

I first tried importing the vcf files into R using vcfR, creating data frames, and filtering with dyplr. This worked, but I couldn't figure out an easy way to recapitulate my vcf files for further processing. (vcf2tidy is a one way street apparently and vcfR won't create a new vcf object from a data frame.) Is there a way to create a new vcf file or use my filtered CHROM & POS information to filter a vcf file? Or is there some easier way (maybe using bcftools) to compare and filter my vcfs?

Thanks! Nina

sequencing rna-seq bcftools R vcf • 131 views
ADD COMMENTlink written 12 weeks ago by nina.maryn10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 981 users visited in the last hour