Question: Bowtie2 : aligned result read count not matching with sam Read count
0
gravatar for vinumanikandan
3.9 years ago by
vinumanikandan0 wrote:

bowtie2 -f -N 1 -p 8 -x REF.fa -1 sample01_1.fasta -2 sample01_2.fasta -S sample01.sam --al sample01_al.fasta --al-conc sample01_alcon.fasta


9260783 reads; of these:
  9260783 (100.00%) were paired; of these:
    9260311 (99.99%) aligned concordantly 0 times
    472 (0.01%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    9260311 pairs aligned concordantly 0 times; of these:
      43 (0.00%) aligned discordantly 1 time
    ----
    9260268 pairs aligned 0 times concordantly or discordantly; of these:
      18520536 mates make up the pairs; of these:
        18520180 (100.00%) aligned 0 times
        356 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.01% overall alignment rate


Result:

Number of reads in sample01_alcon.1.fasta: 472
Number of reads in sample01_alcon.2.fasta: 472
Number of reads in sample01_al.fasta:0


To find number of reads matching a Particular Reference from sam file I run following :

perl -ne ' if (/^\@SQ/) { @F = split(/\t|:/, $_); print $F[2]."\n" } '  sample01.sam > agi_list.txt
perl -ne ' chomp($_); print $_."\t".`grep -c "\t$_" sample01.sam ` ' agi_list.txt >1a.counts

I have Couple of questions:

1) So In theory I should get (2*aligned concordantly)+(2*discordantly) =(2*472)+(2*43)=1030 . Right? but I got another number:1742 which is like (2*472)+(2*43)+(2*356)=1742.

2) I could not find the 43 aligned discordantly sequences and the 356 sequences in my alinged file why is it so and how I could find it?

3) What I need to do is calculate RPKM and also retrieve those number of sequences

ADD COMMENTlink written 3.9 years ago by vinumanikandan0
2

I think the more important question is, why don't 99.99% of your reads align at all? Before this is fixed, I wouldn't bother about any further analyses.

ADD REPLYlink written 3.9 years ago by Michael Dondrup46k

That is because currently I am looking into a single gene of a whole data set.

ADD REPLYlink written 3.9 years ago by vinumanikandan0

That alone shouldn't cause this.

ADD REPLYlink written 3.9 years ago by Devon Ryan89k

The input file is a metagenome sequences in fasta format.

ADD REPLYlink written 3.9 years ago by vinumanikandan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 762 users visited in the last hour