truncate VCF files so they are same length
0
0
Entering edit mode
4.5 years ago
Wilber0x ▴ 50

I have five VCF files, for five different species. They were all aligned to the same reference sequence. Some VCF files are longer than others, as more sequence aligned to the reference sequence in different species. The different chromosomes also have different numbers of variants for each species.

These are short segments of two VCFs to explain my point:

ACmerged_contig_7648    573 .   A   .   52  .   DP=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=22  GT  0/0
*ACmerged_contig_7648   574 .   T   .   52  .   DP=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=22  GT  0/0
ACmerged_contig_9049    831 .   T   .   58  .   DP=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=28  GT  0/0*
ACmerged_contig_9049    832 .   A   .   58  .   DP=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=28  GT  0/0

ACmerged_contig_7648    669 .   C   .   29.5864 .   DP=1;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.   GT  ./.
*ACmerged_contig_7648   670 .   A   .   29.5864 .   DP=1;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.   GT  ./.
ACmerged_contig_9049    258 .   A   .   29.5864 .   DP=1;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.   GT  ./.*
ACmerged_contig_9049    259 .   T   .   52  .   DP=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=22  GT  0/0

As you can see, the lines in italics show that alignments to the chromosomes stop and start at different sites for the different species. Is there a software that will remove sites of VCFs that are not present in all of my VCF files? i.e. a way to trim my VCF files to only posess the sites found in all the VCF files?

vcf SNP sequence genome • 967 views
ADD COMMENT

Login before adding your answer.

Traffic: 3040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6