Question: (Closed) Bowtie2 : aligned result read count not matching with sam Read count
0
gravatar for bassanio
5.7 years ago by
bassanio20
bassanio20 wrote:
bowtie2 -f -N 1 -p 8 -x REF.fa -1 sample01_1.fasta -2 sample01_2.fasta -S sample01.sam --al sample01_al.fasta --al-conc sample01_alcon.fasta

9260783 reads; of these:
  9260783 (100.00%) were paired; of these:
    9260311 (99.99%) aligned concordantly 0 times
    472 (0.01%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    9260311 pairs aligned concordantly 0 times; of these:
      43 (0.00%) aligned discordantly 1 time
    ----
    9260268 pairs aligned 0 times concordantly or discordantly; of these:
      18520536 mates make up the pairs; of these:
        18520180 (100.00%) aligned 0 times
        356 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.01% overall alignment rate

Result:

Number of reads in sample01_alcon.1.fasta: 472
Number of reads in sample01_alcon.2.fasta: 472
Number of reads in sample01_al.fasta:0

To find number of reads matching a Particular Reference from sam file I run following :

perl -ne ' if (/^\@SQ/) { @F = split(/\t|:/, $_); print $F[2]."\n" } '  sample01.sam > agi_list.txt
perl -ne ' chomp($_); print $_."\t".`grep -c "\t$_" sample01.sam ` ' agi_list.txt >1a.counts

I have Couple of questions:

  1. So In theory I should get (2*aligned concordantly)+(2*discordantly) =(2*472)+(2*43)=1030. Right? but I got another number:1742 which is like (2*472)+(2*43)+(2*356)=1742.
  2. I could not find the 43 aligned discordantly sequences and the 356 sequences in my alinged file why is it so and how I could find it?
  3. What I need to do is calculate RPKM and also retrieve those number of sequences
ADD COMMENTlink modified 5 months ago by _r_am32k • written 5.7 years ago by bassanio20
2

I think the more important question is, why don't 99.99% of your reads align at all? Before this is fixed, I wouldn't bother about any further analyses.

ADD REPLYlink written 5.7 years ago by Michael Dondrup48k

That is because currently I am looking into a single gene of a whole data set.

ADD REPLYlink written 5.7 years ago by bassanio20

That alone shouldn't cause this.

ADD REPLYlink written 5.7 years ago by Devon Ryan98k

The input file is a metagenome sequences in fasta format.

ADD REPLYlink written 5.7 years ago by bassanio20

Hello bassanio!

We believe that this post does not fit the main topic of this site.

No follow up from OP in over 5 years.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 5 months ago by _r_am32k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour
_