Question

prep_reads.info vs. align_summary.txt

0

Entering edit mode

8.5 years ago

Xin ▴ 70

Hi

I used TopHat to map my reads against their relative reference genome.

When I look inside prep_reads.info, I see:

left_min_read_len=90
left_max_read_len=90
left_reads_in=24995053
left_reads_out=24994132
right_min_read_len=90
right_max_read_len=90
right_reads_in=24995053
right_reads_out=24994422

Then when I open align_summary.txt, I see:

Left reads:
               Input:  24995053
              Mapped:  22715900 (90.9% of input)
            of these:   2106892 ( 9.3%) have multiple alignments (89 have >20)
Right reads:
               Input:  24995053
              Mapped:  22310498 (89.3% of input)
            of these:   2088630 ( 9.4%) have multiple alignments (148 have >20)
90.1% overall read alignment rate.

Aligned pairs:  21074559
     of these:   1469415 ( 7.0%) have multiple alignments
          and:    107380 ( 0.5%) are discordant alignments
83.9% concordant pair alignment rate.

In align_summary.txt I know the changes between "Input" number and "Mapped" is because some of reads are unmapped to reference genome. Ok.

But for prep_reads.info I do not know why _reads_out numbers are different from _reads_in numbers and if this difference is due to unmapped reads, why the difference is not equal to difference between the Input number and Mapped number in align_summary.txt?

       prep_reads.info        align_summary.txt
left   24995053-24994132=921  24995053-22715900=2279153
right  24995053-24994422=631  24995053-22310498=2684555

RNA-Seq TopHat • 2.0k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by Xin ▴ 70

Ram · Answer 1 · 2015-10-26

1

Entering edit mode

8.5 years ago

Devon Ryan 104k

The difference is due to filtering for things such as read length. Some reads are too short, so they're excluded. This occurs before any mapping takes place.

ADD COMMENT • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

I see. I did not know that. I thought we can eliminate short reads only by trimmomatic (MINLEN). I did not know mapping tools also eliminate some reads.

Like always, thank you Devon

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.5 years ago by Xin ▴ 70

0

Entering edit mode

Well, "things such as read length". It's filtering for other things too. In your case, one of these "other things" is what's causing additional reads to get dropped, since your input is all 90 bases.

ADD REPLY • link 8.5 years ago by Devon Ryan 104k