Representation of common SNPs in text files
0
0
Entering edit mode
4.5 years ago
evelyn ▴ 230

Hello All,

I have seven vcf files generated using different variant callers. These files are really big i.e., approximately 50 GB each as each vcf file contains SNP information for 50 samples. I want to represent common SNPs among all these variant callers. Because of the big size of files, I am not able to use UpSetR plot package. Another goal is to use the files for DAPC with R where we donot need SNP POS information. So with this goal and to reduce the sizes of all files, I filtered missing SNP information, redundant SNPs and heterozygous SNPs using awk. The file size became manageable but the files are no longer vcf. The files are text files as shown in below lines:

Sample1 Sample2 Sample3-------Sample50
G G G------G 
A T A------C

Now I am able to use the files for DAPC but I am not able to represent the common SNPs among all files using any plotting software. I will appreciate any suggestion to move further. Thank you!

SNP • 638 views
ADD COMMENT
0
Entering edit mode

did you use standard tools such as bcftools (isec/merge), rtgtool (vcfeval), vcftools in intersecting VCFs? @ evelyn

ADD REPLY
0
Entering edit mode

I did joint variant calling for all the 50 sorted bam files together. So I got a single vcf file with all information with no need of merging individual vcf files.

ADD REPLY

Login before adding your answer.

Traffic: 2410 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6