Question: emtpy bcf file
gravatar for Illinu
9 days ago by
Illinu80 wrote:

I am aligning genomic reads to a set of 6 genes all saved in one file in fasta format. The sam and bam files look ok but the mpileup step returns empty bcf files, with only the header until:

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT sample1-thal.nodup.sorted.bam

This is the whole code used:

for f in /reads/*1P.fastq.gz
  bname=$(basename $sname)
  bwa mem thal $sname.1P.fastq.gz $sname.2P.fastq.gz > $bname-thal.sam
  samtools view -bt thal.fasta.fai $bname-thal.sam > $bname-thal.bam
  samtools sort $bname-thal.bam > $bname-thal.sorted.bam
  samtools rmdup $bname-thal.sorted.bam $bname-thal.nodup.bam
  samtools sort $bname-thal.nodup.bam > $bname-thal.nodup.sorted.bam
  samtools mpileup -uf thal.fasta $bname-thal.nodup.sorted.bam > $bname-thal.bcf
  bcftools call -mv -Ov $bname-thal.bcf > $bname-thal.vcf

I used this code with a different set of genes and it worked, what can be going wrong?

this is what I see with samtools tview

mpileup samtools alignment • 64 views
ADD COMMENTlink modified 3 days ago • written 9 days ago by Illinu80

two comments: * use defensive bash programming to test if something failed : * use linux pipeline to avoid those numerous intermediate files.

ADD REPLYlink written 9 days ago by Pierre Lindenbaum89k

Thanks @Pierre, I normally run the pipeline for some commands, some of those intermediate files I need for other tasks. But in the script above I am testing this way to see at what step it fails. So far up to the *.nodup.sorted.bam it works well, that's how I know that it is the bcf file that is not being processed. So the alignment is working, not the variant calling

ADD REPLYlink modified 8 days ago • written 8 days ago by Illinu80
gravatar for Illinu
3 days ago by
Illinu80 wrote:

This was corrected at the mpileup step with -A option

  samtools mpileup -ug -A -f thal.fasta $bname-thal.nodup.sorted.bam > $bname-thal.bcf

-A, --count-orphans Do not skip anomalous read pairs in variant calling. Maybe this is because the sequences I used as reference are short and the pairs from the sequencing come from a large fragment thereofre living many mapping reads as orphans.

ADD COMMENTlink written 3 days ago by Illinu80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 474 users visited in the last hour