Question: Paired bowtie2: 91% overall alignment but 0.01% concordant - should I be worried?
0
gravatar for Bilal Akil
3.9 years ago by
Bilal Akil30
Sydney, Australia
Bilal Akil30 wrote:

I just ran my first successful (I think) DNA alignment operation with `bowtie2` on some paired end data (namely SRR390728) and these were the results I got:

7178576 reads; of these:
  7178576 (100.00%) were paired; of these:
    7177909 (99.99%) aligned concordantly 0 times
    225 (0.00%) aligned concordantly exactly 1 time
    442 (0.01%) aligned concordantly >1 times
    ----
    7177909 pairs aligned concordantly 0 times; of these:
      1624832 (22.64%) aligned discordantly 1 time
    ----
    5553077 pairs aligned 0 times concordantly or discordantly; of these:
      11106154 mates make up the pairs; of these:
        1231919 (11.09%) aligned 0 times
        1884213 (16.97%) aligned exactly 1 time
        7990022 (71.94%) aligned >1 times
91.42% overall alignment rate

So my first impression was "WOOOOO!!! AWESOME", but on further inspection I saw:

7177909 (99.99%) aligned concordantly 0 times

And so my second impression was "hmmmm.......", and now here I am. From the Bowtie 2 manual I found that:

- "A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align "concordantly". If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren't in the expected relative orientation, or aren't within the expected distance range, or both), the pair is said to align "discordantly"."

Now I'm confused about how this information applies to my data. Should I interpret this as a problem with my dataset? Is my alignment reliable? What else should this lead me to consider?

alignment bowtie2 • 3.2k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Bilal Akil30
5

This is an RNAseq dataset. Presumably you meant to align with hisat or tophat. 

ADD REPLYlink written 3.9 years ago by Devon Ryan89k

Would that be the likely explanation behind this issue?

ADD REPLYlink written 3.9 years ago by Bilal Akil30

That'll at least explain a large chunk of the problem. Did you trim this data at all?
 

ADD REPLYlink written 3.9 years ago by Devon Ryan89k

I didn't trim it - don't really know what anything is at this point, but I let it run with the `--local` option overnight as I understand that is more lenient about the start and end of reads (which I assume is what was in subject when you asked about trimming) and now we've got 0.14% concordant alignments :P Not so much better.

I'm going to try and run it through Tophat now.

Is "trimming" something that you recommend should always be done? I'm still trying to figure out a kind of standard procedure so it'd help to know what other people recommend :) 

ADD REPLYlink written 3.9 years ago by Bilal Akil30
2
gravatar for Bilal Akil
3.9 years ago by
Bilal Akil30
Sydney, Australia
Bilal Akil30 wrote:

Devon Ryan nailed it in his comment to the main question:

- "This is an RNAseq dataset. Presumably you meant to align with hisat or tophat."

I followed his advice and tried aligning it with Tophat and these were the results:

Left reads:
          Input     :   7178576
           Mapped   :   6553268 (91.3% of input)
            of these:   2950549 (45.0%) have multiple alignments (966184 have >20)
Right reads:
          Input     :   7178576
           Mapped   :   6448797 (89.8% of input)
            of these:   2912329 (45.2%) have multiple alignments (966184 have >20)
90.6% overall read mapping rate.

Aligned pairs:   6114755
     of these:   2809064 (45.9%) have multiple alignments
                  253417 ( 4.1%) are discordant alignments
81.7% concordant pair alignment rate.

Although I'm not certain whether that's a good result or not, it sure must be better than 0.01% concordant aligns.

Thanks Devon Ryan.

ADD COMMENTlink written 3.9 years ago by Bilal Akil30

Odd though that almost 1 million left and 1 million right reads aligned >20 times..

ADD REPLYlink written 3.9 years ago by Bilal Akil30

That's much better. The multiple alignment rate does seem rather high, but perhaps that's innate to the experiment (I haven't looked up the details of that sample).

ADD REPLYlink written 3.9 years ago by Devon Ryan89k
0
gravatar for Ian
3.9 years ago by
Ian5.4k
University of Manchester, UK
Ian5.4k wrote:

That is worrying.  It might be worth checking whether the R1 and R2 read files are similarly sorted.  It reads like the mapper has been able to map the reads, but not make sense of the pairing.  Also, what was the insert size for the experiment.  By default bowtie2 only allows up to 500bp been pairs.

ADD COMMENTlink written 3.9 years ago by Ian5.4k

This is what the beginning of the R1 and R2 files look like:

==> SRR390728_1.fastq <==
@SRR390728.1 1 length=36
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGC
+SRR390728.1 1 length=36
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142
@SRR390728.2 2 length=36
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGT
+SRR390728.2 2 length=36
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4&&5&&
@SRR390728.3 3 length=36
CCAGCCTGGCCAACAGAGTGTTACCCCGTTTTTACT
+SRR390728.3 3 length=36
-;;;8;;;;;;;,*;;';-4,44;,:&,1,4'./&1
@SRR390728.4 4 length=36
ATAAAATCAGGGGTGTTGGAGATGGGATGCCTATTT
+SRR390728.4 4 length=36
1;;;;;;,;;4;3;38;8%&,,;)*;1;;,)/%4+,
@SRR390728.5 5 length=36
TTAAGAAATTTTTGCTCAAACCATGCCCTAAAGGGT
+SRR390728.5 5 length=36
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;9445552

==> SRR390728_2.fastq <==
@SRR390728.1 1 length=36
GATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=36
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96&&&&(
@SRR390728.2 2 length=36
TCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=36
;;;;;;;;;;;;;;;;;;;;;<9;<;;;;;464262
@SRR390728.3 3 length=36
TATTTATTATTATTATTTTGAGACAGAGCATTGGTC
+SRR390728.3 3 length=36
9;;;;;;669;;99;;;;;-;3;2;0;+;7442&2/
@SRR390728.4 4 length=36
CTGCACACCTTGGCCTCCCAAATTGCTGGGATTACA
+SRR390728.4 4 length=36
;1;;);;;;;;;4;(;1;;;;24;;;;41-444//0
@SRR390728.5 5 length=36
TCTGTAATAAATAGGGCTGGGAAAACTGGCAAGCCA
+SRR390728.5 5 length=36
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;446662

Is this... normal looking? :P I don't think any of the reads come close to 500bp in length.

ADD REPLYlink written 3.9 years ago by Bilal Akil30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 845 users visited in the last hour