Question: samtools mpileup reports far too few reads
0
gravatar for maik.engeholm
3.1 years ago by
maik.engeholm0 wrote:

Dear all,

all new to samtools, I would like to learn more about the mpileup command. As a test, I have used simulated reads on the first 1 K of the E. coli genome as described here: [http://biobits.org/samtools_primer.html][1]. In short:

~/bioinf$ wgsim -N1000 -S1 -r0 genomes/E_coli_1K.fa simulated_reads/sim_reads.fq /dev/null 
[wgsim] seed = 1 
[wgsim_core] calculating the total length of the reference sequence... 
[wgsim_core] 1 sequences, total length: 1000

~/bioinf$ bowtie2 -x genomes/b2_indices/E_coli_1K -U simulated_reads/sim_reads.fq -S alignments/E_coli_1K.sam
1000 reads; of these:
  1000 (100.00%) were unpaired; of these:
    30 (3.00%) aligned 0 times
    970 (97.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
97.00% overall alignment rate

(By the way, why do 30 reads not align?)

samtools view -b -S -o alignments/E_coli_1K.bam alignments/E_coli_1K.sam

I have checked that the .sam and .bam file contain 1000 alignments each as they should.

~/bioinf$ samtools view -c alignments/E_coli_1K.sam
1000

~/bioinf$ samtools view -c alignments/E_coli_1K.bam
1000

When I invoke samtools mpileup

~/bioinf$ samtools mpileup -f genomes/E_coli_1K.fa alignments/E_coli_1K_test.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
1:1000  40  G   1   ^K. 2
1:1000  41  T   1   .   2
1:1000  42  G   1   .   2
1:1000  43  G   1   .   2
1:1000  44  A   1   .   2
1:1000  45  T   1   .   2
1:1000  46  T   1   .   2
1:1000  47  A   1   .   2
1:1000  48  A   1   .   2
1:1000  49  A   1   .   2
1:1000  50  A   1   .   2
1:1000  51  A   1   C   2
1:1000  52  A   1   .   2
1:1000  53  A   1   .   2
1:1000  54  G   1   .   2
1:1000  55  A   1   .   2
1:1000  56  G   1   .   2
1:1000  57  T   1   .   2
1:1000  58  G   1   .   2
1:1000  59  T   1   .   2
1:1000  60  C   1   .   2
1:1000  61  T   1   .   2
1:1000  62  G   1   .   2
1:1000  63  A   1   .   2
1:1000  64  T   1   .   2
1:1000  65  A   1   .   2
1:1000  66  G   1   .   2
1:1000  67  C   1   .   2
1:1000  68  A   1   .   2
1:1000  69  G   1   .   2
1:1000  70  C   1   .   2
1:1000  71  T   1   .   2
1:1000  72  T   1   .   2
1:1000  73  C   1   .   2
1:1000  74  T   1   .   2
1:1000  75  G   1   .   2
1:1000  76  A   1   .   2
1:1000  77  A   1   .   2
1:1000  78  C   1   .   2
1:1000  79  T   1   .   2
1:1000  80  G   1   .   2
1:1000  81  G   1   .   2
1:1000  82  T   1   .   2
1:1000  83  T   1   .   2
1:1000  84  A   1   .   2
1:1000  85  C   1   .   2
1:1000  86  C   1   .   2
1:1000  87  T   1   .   2
1:1000  88  G   1   .   2
1:1000  89  C   1   .   2
1:1000  90  C   2   .^K.    22
1:1000  91  G   2   ..  22
1:1000  92  T   2   ..  22
1:1000  93  G   2   ..  22
1:1000  94  A   2   ..  22
1:1000  95  G   2   ..  22
1:1000  96  T   2   ..  22
1:1000  97  A   2   ..  22
1:1000  98  A   2   ..  22
1:1000  99  A   2   ..  22
1:1000  100 T   2   ..  22
1:1000  101 T   2   ..  22
1:1000  102 A   2   ..  22
1:1000  103 A   2   ..  22
1:1000  104 A   2   ..  22
1:1000  105 A   2   ..  22
1:1000  106 T   2   ..  22
1:1000  107 T   2   ..  22
1:1000  108 T   2   ..  22
1:1000  109 T   2   .$. 22
1:1000  110 A   1   .   2
1:1000  111 T   1   .   2
1:1000  112 T   1   .   2
1:1000  113 G   1   .   2
1:1000  114 A   1   .   2
1:1000  115 C   1   .   2
1:1000  116 T   1   .   2
1:1000  117 T   1   .   2
1:1000  118 A   1   .   2
1:1000  119 G   1   .   2
1:1000  120 G   1   .   2
1:1000  121 T   1   .   2
1:1000  122 C   1   .   2
1:1000  123 A   1   .   2
1:1000  124 C   1   .   2
1:1000  125 T   1   .   2
1:1000  126 A   1   .   2
1:1000  127 A   1   .   2
1:1000  128 A   1   .   2
1:1000  129 T   1   .   2
1:1000  130 A   1   .   2
1:1000  131 C   1   .   2
1:1000  132 T   1   .   2
1:1000  133 T   1   .   2
1:1000  134 T   1   .   2
1:1000  135 A   1   .   2
1:1000  136 A   1   .   2
1:1000  137 C   1   .   2
1:1000  138 C   1   .   2
1:1000  139 A   1   .   2
1:1000  140 A   1   .   2
1:1000  141 T   1   .   2
1:1000  142 A   1   .   2
1:1000  143 T   1   .   2
1:1000  144 A   1   .   2
1:1000  145 G   1   .   2
1:1000  146 G   1   .   2
1:1000  147 C   1   .   2
1:1000  148 A   1   .   2
1:1000  149 T   1   .   2
1:1000  150 A   1   .   2
1:1000  151 G   1   .   2
1:1000  152 C   1   .   2
1:1000  153 G   1   .   2
1:1000  154 C   1   .   2
1:1000  155 A   1   .   2
1:1000  156 C   1   .   2
1:1000  157 A   1   .   2
1:1000  158 G   1   .   2
1:1000  159 A   1   .$  2
[bam_pileup_core] the input is not sorted (reads out of order)

Only positions 40-159 seem to be covered by reads and each one only once or twice at most. So what happened with the rest of my reads (some of which align to positions beyond 159 as you can check in the sam/bam file). Also, with 1000 reads in total, the coverage at the above positions should be a lot higher (and less uniform) than reported. I have tried to play around with -d and -B, all to no avail. What am I doing wrong?

Your help is much appreciated,

mce

alignment • 1.2k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by maik.engeholm0

[bam_pileup_core] the input is not sorted (reads out of order)

Try sorting your BAM file.

Edit: OP figured it out on his/her own.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by harold.smith.tarheel4.5k
0
gravatar for maik.engeholm
3.1 years ago by
maik.engeholm0 wrote:

Sorry, forgot to sort and index the .bam file. When I did this, all worked fine.

ADD COMMENTlink written 3.1 years ago by maik.engeholm0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2259 users visited in the last hour