bcftools error duplicate samples
1
0
Entering edit mode
3.6 years ago
abyousaf • 0

I am trying to merge vcf files across chromosomes 1-22. I am using bcftools v1.9 in order to do this. The code I am using is bcftools merge 'myfile1.vcf.gz' 'myfile2.vcf.gz'etc....'myfile22.vcf.gz' -o myfile1_22.vcf.gz

However I get the following error: "Error: Duplicate sample names (1310229_1310229), use --force-samples to proceed anyway."

I'm afraid to use --force-samples because I don't understand how this will affect the merged vcf file and how many duplicates there are. The data is from the UK Biobank and the VCF files are massive in size (total across chromosomes =1.3TB).

Any suggestions to actually solve the error rather than use --force-samples?

NOTE: I am VERY VERY new to biostatistical analysis. I appreciate your advice heavily. I would appreciate it more if your advice was structured for a beginner.

R SNP gene • 3.5k views
ADD COMMENT
0
Entering edit mode

I checked the headers and found out the first sample is 1310229. I think if I use force samples, it will prepend every single sample but I don’t know why. Any ideas?

ADD REPLY
2
Entering edit mode
3.6 years ago
Medhat 9.7k

No problems will happen.
Using --force-samples will prepend the index of the sample to its name in the merging file.
As what happened in sample S3 below:

when merging file A.vcf.gz containing samples S1, S2 and S3 and file B.vcf.gz containing samples S3 and S4, the output file will contain four samples named S1, S2, S3, 2:S3 and S4.

ADD COMMENT
0
Entering edit mode

Thank you very much Medhat! This was helpful

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6