Entering edit mode
6.2 years ago
marcus.hooker
•
0
I called SNPs with bcftools mpileup and I had 120 input sample files, my command looked like this,
bcftools mpileup -d 8000 -f Reference.fasta -Ob Input.bam Input2.bam ...... Input120.bam
But in my bcf file, it doesn't have the sample IDs for each sample, it just shows one sample called "sample_ID."
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample_ID
1A 20677 . AGGG AGG 5.04449 . INDEL;IDV=1;IMF=1;DP=1;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=37 GT:PL 1/1:33,3,0
1A 20719 . G C 8.13869 . DP=1;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=37 GT:PL 1/1:37,3,0
1A 20732 . G T 8.13869 . DP=1;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,1,0;MQ=37 GT:PL 1/1:37,3,0
What went wrong? How should I fix this problem? I was expecting a column for each sample so that after FORMAT it said Input Input2 Input3 .... Input120.
The sample name is taken from the bam header and not from the file. I guess you will have the same name in all files.
Could you please show the output of
samtools view -H input.bamfor two files?I see. The sample names in the bam header just say "sample_ID" is there a way to get them to have the name of the actual sample? Can I just provide bcftools mpileup a list of sample names as well?
Not that I'm aware. The much cleaner way is to fix the sample names in every bam file e.g. with
samtools reheader.If you have a list with filename+sample_name and show us how the complete header currently looks like I might help you with this.