emtpy bcf file
1
0
Entering edit mode
7.2 years ago
Illinu ▴ 110

I am aligning genomic reads to a set of 6 genes all saved in one file in fasta format. The sam and bam files look ok but the mpileup step returns empty bcf files, with only the header until:

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT sample1-thal.nodup.sorted.bam

This is the whole code used:

for f in /reads/*1P.fastq.gz
 do
  name=${f%%*/}
  sname=${name%%.*}
  bname=$(basename $sname)
  bwa mem thal $sname.1P.fastq.gz $sname.2P.fastq.gz > $bname-thal.sam
  samtools view -bt thal.fasta.fai $bname-thal.sam > $bname-thal.bam
  samtools sort $bname-thal.bam > $bname-thal.sorted.bam
  samtools rmdup $bname-thal.sorted.bam $bname-thal.nodup.bam
  samtools sort $bname-thal.nodup.bam > $bname-thal.nodup.sorted.bam
  samtools mpileup -uf thal.fasta $bname-thal.nodup.sorted.bam > $bname-thal.bcf
  bcftools call -mv -Ov $bname-thal.bcf > $bname-thal.vcf
done

I used this code with a different set of genes and it worked, what can be going wrong?

this is what I see with samtools tview

https://postimg.org/image/gli9yz5q7/

samtools alignment mpileup • 1.7k views
ADD COMMENT
0
Entering edit mode

two comments: * use defensive bash programming to test if something failed : http://www.davidpashley.com/articles/writing-robust-shell-scripts/ * use linux pipeline to avoid those numerous intermediate files.

ADD REPLY
0
Entering edit mode

Thanks @Pierre, I normally run the pipeline for some commands, some of those intermediate files I need for other tasks. But in the script above I am testing this way to see at what step it fails. So far up to the *.nodup.sorted.bam it works well, that's how I know that it is the bcf file that is not being processed. So the alignment is working, not the variant calling

ADD REPLY
0
Entering edit mode
7.2 years ago
Illinu ▴ 110

This was corrected at the mpileup step with -A option

  samtools mpileup -ug -A -f thal.fasta $bname-thal.nodup.sorted.bam > $bname-thal.bcf

-A, --count-orphans Do not skip anomalous read pairs in variant calling. Maybe this is because the sequences I used as reference are short and the pairs from the sequencing come from a large fragment thereofre living many mapping reads as orphans.

ADD COMMENT

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6