Comparing Allelic Frequency differences between vcf files
0
1
Entering edit mode
4.1 years ago
nina.maryn ▴ 30

Hello,

This is my first post and my first big bioinformatics project. I'm new to command line and the tools for analyzing vcf files, so please be kind!

I have two VCF files, each representing GATK filtered Indels for RNA-seq in 6 tissues from two divergent species, A & B. They are aligned to their closest domesticated relative, C, whose genome is sequenced. We want to ultimately create "pseudo-reference" genomes for A and B using their RNA-seq data to be able to sequence more closely related species to A & B and align them more accurately in the future. At the current step, we want to filter out ALT indels that have frequency differences <= 0.6 between A & B, so that we only keep differences that are fixed between the species.

How do I compare AF site by site between two VCF files? I want to (1) keep unique indels with an ALT AF >=0.6 in each species and (2) keep shared indels where the ALT AF difference between A & B is >= 0.6. I ultimately need to end up with two vcf files again, for A and B.

I first tried importing the vcf files into R using vcfR, creating data frames, and filtering with dyplr. This worked, but I couldn't figure out an easy way to recapitulate my vcf files for further processing. (vcf2tidy is a one way street apparently and vcfR won't create a new vcf object from a data frame.) Is there a way to create a new vcf file or use my filtered CHROM & POS information to filter a vcf file? Or is there some easier way (maybe using bcftools) to compare and filter my vcfs?

Thanks! Nina

RNA-Seq vcf bcftools R sequencing • 759 views
ADD COMMENT

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6