bcftools [mpileup] 4 samples in 28 input files
1
0
Entering edit mode
5.5 years ago

Hi,

I'm running bcftools to create a vcf. However, I get:

[mpileup] 4 samples in 28 input files

The three it counts are all sorted bamfiles (samtools), but somehow it doesn't take into account the others for which I removed duplicates with picard and performed re-aliging with GATK.

Any ideas what the issue is?

Thanks, Stefan

snp • 3.5k views
ADD COMMENT
0
Entering edit mode

Hi stefanprost.research, welcome to Biostars. Please post the command line and an abstract of the output in order to provide more details. Please consider going once through How To Ask Good Questions On Technical And Scientific Forums. Cheers!

ADD REPLY
0
Entering edit mode

The command was either:

samtools mpileup  -s -Q 20 -A -C 50 -t DP,SP -d 5000 -E -ugf reference.fasta *.bam | bcftools call -c - > xx.vcf

or

bcftools mpileup  -q 20 -A -C 50 -d 5000 -E -O u -t DP -g 3 --threads 10 -f reference.fasta *.bam > xx.vcf

Here's an example output line:

contig_121389   110 .   G   .   29.1379 .   DP=51;MQ0F=0;AF1=0;AC1=0;DP4=0,48,0,0;MQ=48;FQ=-27.9742 GT:PL:DP:SP 0/0:0:48:0  0/0:0:0:0   0/0:0:0:0   0/0:0:0:0

It only shows stats for 4 of the 28 bam files I imported. I have run the first command successfully before, but my hard-disk crashed. I reused the same command as before, but with new versions of the tools involved and the reads remapped to the same reference.

Cheers and thanks, Stefan

ADD REPLY
0
Entering edit mode

show use the header starting with #CROM of the VCF line. Tell us how you set the 28 distinct READ-GROUPs in your 28 bam files

ADD REPLY
0
Entering edit mode

I used picard. I reran them individually to make sure they do have different IDs, but now I get 1 sample out of 21 files (I used a slightly smaller set).

java -jar picard.jar AddOrReplaceReadGroups I=file.marked_duplicates.realigned.bam O=file.marked_duplicates.realigned.RG.bam RGID=ind_19 RGPL=illumina RGLB=lib1 RGPU=unit1 RGSM=20

Header is this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  20

It used to show the three just sorted bam files.

Any ideas?

thanks!

Cheers, Stefan

ADD REPLY
2
Entering edit mode
5.5 years ago

samtools/bcftools recognize the different samples by the sample name given in the read group of each bam file. It is very likely that you have multiple files with the same sample name in the read group.

You can have a look at the read groups with:

$ samtools view -H sample.bam | grep '@RG'

The sample name is introduced by: SM:

fin swimmer

ADD COMMENT
0
Entering edit mode

oh ok. I'll adjust that parameter!

ADD REPLY
0
Entering edit mode

Hello stefanprost.research ,

If an answer/comment was helpful, you should upvote it; if the answer/comment resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6