I am trying to obtain a strict majority consensus from a bam file, i.e. Using only the reads aligned and NOT the reference used to align. So for example, in the case1, because the ref is completely covered by the reads, the consensus will have the same length than the reference, without gaps:
But for the 2nd case, I expect to have N's or a shorter consensus. In this case, using the reference to fill the gaps would be ok.
I have tried with samtools/bcftools and gatk but it is not working properly, i.e. it uses variations from the reads with low frequencies.
The samtools commands:
bwa mem -B 20 -A 3 -O 30 -E 3 -t 4 ref.fa R1.fastq R2.fastq > file.sam samtools view -bT ref.fa file.sam | samtools sort > file.bam samtools index file.bam samtools mpileup -uf ref.fa file.bam | bcftools call -mv -Oz -o output.vcf cat ref.fa | bcftools consensus output.vcf > output.fa
The gatk ones (from the bwa/samtools bam output):
java -jar picard.jar MarkDuplicates INPUT=file.bam O=file.dup.bam M=file.metrics gatk HaplotypeCaller --reference ref.fa --input file.dup.bam --output gatk.vcf gatk FastaAlternateReferenceMaker -R ref.fa --variant gatk.vcf -o gatk.fa