bcftools merge duplicate names
0
1
Entering edit mode
20 months ago
evelyn ▴ 140

Hello everyone,

I wanted to merge vcf files using

bcftools merge  --file-list sample_list.txt -O v -o merge.vcf

But it gives an error for sample16.vcf.gz

Error: Duplicate sample names (sample16.vcf.gz), use --force-samples to proceed anyway.

Although I do not have any other vcf file with the same name in the same directory. I still used,

bcftools merge --force-samples -m none --file-list sample_list.txt -O v -o merge1.vcf

Now it gives a weird name to that particular sample in the output file:

15:sample15.vcf.gz

I am not sure if it is extracting right information from sample16.vcf file or not. I compared this file column from merged file with individual vcf file and it is not same.

I will appreciate any help to figure out this problem of duplicate names of files. Thank you!

snp • 1.8k views
ADD COMMENT
1
Entering edit mode

The sample name is not derived from the filename. The sample name is within the vcf file and is the sample name used in the bam file.

So I guess the 10th column in your vcf files have the same header.

Check the output of:

 $ zgrep "^#CHROM" *.vcf.gz | cut -f10
ADD REPLY
1
Entering edit mode

bcftools query -l does the same thing :-)

ADD REPLY
0
Entering edit mode

I was quite sure bcftools can do this, but I was to lazy to look up the man page :P

Even if it not always true, as a rule of thumb, if there is something you cannot do with your vcf file using bcftools than you properly don't need it (or at least you should rethink your problem twice).

ADD REPLY
0
Entering edit mode

Is sample_list.txt a list of unique file names? Does it contains exactly one column separated by new lines? Can you show us the output of head sample_list.txt?

Where does the .bam suffix even come from? The files in the file list should be VCF files, not bam files.

ADD REPLY
0
Entering edit mode

Yes, sample_list.txt contains one column with unique file names:

sample1.vcf.gz
sample2.vcf.gz
sample3.vcf.gz
sample4.vcf.gz
sample5.vcf.gz
sample6.vcf.gz
sample7.vcf.gz
sample8.vcf.gz
sample9.vcf.gz
sample10.vcf.gz
sample11.vcf.gz
sample12.vcf.gz
sample13.vcf.gz
sample14.vcf.gz
sample15.vcf.gz
sample16.vcf.gz
sample17.vcf.gz
sample18.vcf.gz
sample19.vcf.gz
sample20.vcf.gz

These vcf.gz files contain only SNP information. There are no other types of variants. Thanks for pointing out. I have edited my question.

ADD REPLY
1
Entering edit mode

Please try this command:

for f in *.vcf.gz
echo -e "${f}\t$(bcftools query -l $f)"

and paste the output here.

ADD REPLY

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6