I'm a beginner in dealing with SNP data. I want to merge 3 vcf files into 1 vcf files. I used the following code:
/home/LXH/biosoft/bcftools/bcftools-1.9/bcftools merge
/home/LXH/work/maize_RIL/results/4.SNP_VarDetect/319/319.filted.SNP.vcf.gz
/home/LXH/work/maize_RIL/results/4.SNP_VarDetect/478/478.filted.SNP.vcf.gz
/home/LXH/work/maize_RIL/365RIL/03vcf314/Zea_mays.314.vcf.gz > merge_all_vcf
but it didn't work, and I got the following information
[W::bcf_hdr_merge] Trying to combine "AC" tag definitions of different lengths
because there was a little difference in the vcf files. The one kind of vcf file is look like this: in the AC(allele count) lines, Number=.
##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to so
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotyp
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
562 scaffold_28 233 . C T 999.00 . AC1=315;
The another vcf file is look like this : in the AC lines ,Number=A
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotyp
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality
So I changed the "Number=. " into "Number=A" , then the 3 vcf files could combine together.However,when I used the PLINK to transform the merged vcf files to plink.ped files, I found the first two samples (resequence data) were zero in the 7th column to the end in the ped files ,all the rest samples were GBS data.
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
319 319 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
478 478 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 0 0 0 0 T T G A G A C C C G C C G G
5 5 0 0 0 0 T T G G G G C C C G C C G G
6 6 0 0 0 0 T T G G G G 0 0 0 0 C C G G
7 7 0 0 0 0 T T G G G G T T C C C C G G
8 8 0 0 0 0 T T G A G A C T C G C C G G
9 9 0 0 0 0 T T G G G G C C C G C C G G
10 10 0 0 0 0 T T G G G G T T C C C C G A
So there is my question, are this situation happened was duo to the erro occured in the merge step between the vcf files? Second, what's the difference between the "Number=." and "Number=A",can I changed it forcely? Hope somebody can help me to figure it out ,cause it really troubled me a lot.Thank you !
what are the version (1st line in header) of those vcf files ?
Can you recall all the samples together in a multisample VCF file ? I also always had technical problems attempting to merge VCFs with many different tools.
Can you possibly try some things:
Then, try the merge again.
I believe that's just a warning, not necessarily a problem.