Question: Bowtie2 : aligned result read count not matching with sam Read count
gravatar for bassanio
4.4 years ago by
bassanio0 wrote:

bowtie2 -f -N 1 -p 8 -x REF.fa -1 sample01_1.fasta -2 sample01_2.fasta -S sample01.sam --al sample01_al.fasta --al-conc sample01_alcon.fasta

9260783 reads; of these:
  9260783 (100.00%) were paired; of these:
    9260311 (99.99%) aligned concordantly 0 times
    472 (0.01%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    9260311 pairs aligned concordantly 0 times; of these:
      43 (0.00%) aligned discordantly 1 time
    9260268 pairs aligned 0 times concordantly or discordantly; of these:
      18520536 mates make up the pairs; of these:
        18520180 (100.00%) aligned 0 times
        356 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.01% overall alignment rate


Number of reads in sample01_alcon.1.fasta: 472
Number of reads in sample01_alcon.2.fasta: 472
Number of reads in sample01_al.fasta:0

To find number of reads matching a Particular Reference from sam file I run following :

perl -ne ' if (/^\@SQ/) { @F = split(/\t|:/, $_); print $F[2]."\n" } '  sample01.sam > agi_list.txt
perl -ne ' chomp($_); print $_."\t".`grep -c "\t$_" sample01.sam ` ' agi_list.txt >1a.counts

I have Couple of questions:

1) So In theory I should get (2*aligned concordantly)+(2*discordantly) =(2*472)+(2*43)=1030 . Right? but I got another number:1742 which is like (2*472)+(2*43)+(2*356)=1742.

2) I could not find the 43 aligned discordantly sequences and the 356 sequences in my alinged file why is it so and how I could find it?

3) What I need to do is calculate RPKM and also retrieve those number of sequences

ADD COMMENTlink written 4.4 years ago by bassanio0

I think the more important question is, why don't 99.99% of your reads align at all? Before this is fixed, I wouldn't bother about any further analyses.

ADD REPLYlink written 4.4 years ago by Michael Dondrup46k

That is because currently I am looking into a single gene of a whole data set.

ADD REPLYlink written 4.4 years ago by bassanio0

That alone shouldn't cause this.

ADD REPLYlink written 4.4 years ago by Devon Ryan92k

The input file is a metagenome sequences in fasta format.

ADD REPLYlink written 4.4 years ago by bassanio0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1635 users visited in the last hour