Question: Merge vcf with bcftools error wrong reference
gravatar for chariko
4 weeks ago by
chariko40 wrote:

Hi all,

I have thousands of vcf which I want to merge into just one vcf. For that purpose, I am using vcftools.

bcftools merge --force -l list_files_to_merge -Oz -o ./Merged.vcf.gz

--force is used because in some vcf files some samples may be repeated. I do not care about that.

The command works but it stops when one reference does not match the reference with other samples. That will probably mean that this vcf used another reference genome that is ok.

The REF prefixes differ: T vs A (1,1) Failed to merge alleles at X:15517198 in file_5000.vcf.gz

I would like to have the possibility of omitting this vcf (file_5000.vcf.gz in the example) and continue with the merge command. Maybe in the log file it may be written which samples were omitted due to this error.

Is there a way for doing this?

ADD COMMENTlink modified 4 weeks ago by Pierre Lindenbaum131k • written 4 weeks ago by chariko40

why not removing file_5000.vcf.gz from list_files_to_merge ?

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum131k

Thanks for your answer

Yes, that is what I did in fact.

The problem is that I have to repeat the whole process again (takes time) and this is happening again with more vcfs. I do not know how many vcf files have this issue.

I may calculate the number of times this will happen. I know that the merging process merges reading line by line the "list_files_to_merge" and it stopped with files in position 2342, 2549, 8876.... knowing that the list has 100k vcfs that means this to happen up to 40 times...

Therefore if there is a way for avoiding this stop it could be great.

ADD REPLYlink written 4 weeks ago by chariko40
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

if your vcfs have a dictionary ( lines in the header starting with ##contig=) , you can use my tool that will produce a tab delimted file with the path to the vcf and a checksum for the dictionary.

You'll get a signature for each distinct dictionary. Group the VCFs for each dictionary.

$ find /home/lindenb/src/jvarkit/src/test/resources -name "*.vcf.gz" -o -name "*.bam" -o -name "*.cram"

9a5c58c2c91e731135b27ed14974523a    .   85  3101976562  1=249250621;2=243199373;3=198022430;4=191154276;5=180915260;6=17111
9a5c58c2c91e731135b27ed14974523a    /home/lindenb/src/jvarkit/src/test/resources/ExAC.r1.sites.vep.vcf.gz
4677ece43eea2b029d0d33fe130ea6c7    .   86  3137454505  chr1=249250621;chr2=243199373;chr3=198022430;chr4=191154276;chr5=18
4677ece43eea2b029d0d33fe130ea6c7    /home/lindenb/src/jvarkit/src/test/resources/manta.B00I9CJ.vcf.gz
bd7e0928fc3c810e48fafc53a4222ed5    .   11  18490   RF01=3302;RF02=2687;RF03=2592;RF04=2362;RF05=1579;RF06=1356;RF07=1074;RF
ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum131k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1843 users visited in the last hour