Question: bcftools merge duplicate names
0
gravatar for evelyn
4 weeks ago by
evelyn30
evelyn30 wrote:

Hello everyone,

I wanted to merge vcf files using

bcftools merge  --file-list sample_list.txt -O v -o merge.vcf

But it gives an error for sample16.vcf.gz

Error: Duplicate sample names (sample16.vcf.gz), use --force-samples to proceed anyway.

Although I do not have any other vcf file with the same name in the same directory. I still used,

bcftools merge --force-samples -m none --file-list sample_list.txt -O v -o merge1.vcf

Now it gives a weird name to that particular sample in the output file:

15:sample15.vcf.gz

I am not sure if it is extracting right information from sample16.vcf file or not. I compared this file column from merged file with individual vcf file and it is not same.

I will appreciate any help to figure out this problem of duplicate names of files. Thank you!

snp • 153 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by evelyn30
1

The sample name is not derived from the filename. The sample name is within the vcf file and is the sample name used in the bam file.

So I guess the 10th column in your vcf files have the same header.

Check the output of:

 $ zgrep "^#CHROM" *.vcf.gz | cut -f10
ADD REPLYlink written 4 weeks ago by finswimmer12k
1

bcftools query -l does the same thing :-)

ADD REPLYlink written 4 weeks ago by RamRS24k

I was quite sure bcftools can do this, but I was to lazy to look up the man page :P

Even if it not always true, as a rule of thumb, if there is something you cannot do with your vcf file using bcftools than you properly don't need it (or at least you should rethink your problem twice).

ADD REPLYlink written 4 weeks ago by finswimmer12k

Is sample_list.txt a list of unique file names? Does it contains exactly one column separated by new lines? Can you show us the output of head sample_list.txt?

Where does the .bam suffix even come from? The files in the file list should be VCF files, not bam files.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by RamRS24k

Yes, sample_list.txt contains one column with unique file names:

sample1.vcf.gz
sample2.vcf.gz
sample3.vcf.gz
sample4.vcf.gz
sample5.vcf.gz
sample6.vcf.gz
sample7.vcf.gz
sample8.vcf.gz
sample9.vcf.gz
sample10.vcf.gz
sample11.vcf.gz
sample12.vcf.gz
sample13.vcf.gz
sample14.vcf.gz
sample15.vcf.gz
sample16.vcf.gz
sample17.vcf.gz
sample18.vcf.gz
sample19.vcf.gz
sample20.vcf.gz

These vcf.gz files contain only SNP information. There are no other types of variants. Thanks for pointing out. I have edited my question.

ADD REPLYlink written 4 weeks ago by evelyn30
1

Please try this command:

for f in *.vcf.gz
echo -e "${f}\t$(bcftools query -l $f)"

and paste the output here.

ADD REPLYlink written 4 weeks ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour