Question

How to unique mappers become concordant pairs aligned >1 times

0

Entering edit mode

3.8 years ago

bertb ▴ 20

Hello,

I prepared .sam files from PE sequencing results as follows:

hisat2 -p 8 --rg-id=UWN_t3 --rg SM:UWN_t3 --rg LB:UWN_t3 --rg PL:ILLUMINA --rg PU:CE9PNANXX.8 -x $RNA_REF_INDEX --dta --rna-strandness RF -1 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_1.fastq.gz -2 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_2.fastq.gz -S ./UWN_t3.sam
#output
43288187 reads; of these:
  43288187 (100.00%) were paired; of these:
    9023789 (20.85%) aligned concordantly 0 times
    16704076 (38.59%) aligned concordantly exactly 1 time
    17560322 (40.57%) aligned concordantly >1 times
    ----
    9023789 pairs aligned concordantly 0 times; of these:
      3361606 (37.25%) aligned discordantly 1 time
    ----
    5662183 pairs aligned 0 times concordantly or discordantly; of these:
      11324366 mates make up the pairs; of these:
        9894957 (87.38%) aligned 0 times
        731747 (6.46%) aligned exactly 1 time
        697662 (6.16%) aligned >1 times
88.57% overall alignment rate

At which point, I remembered that my organism (yeast) does not have any introns larger than 2500bp, so I added the option --max-intronlen 2500 to the command, and got the following output:

hisat2 -p 8 --rg-id=UWN_t4 --rg SM:UWN_t4 --rg LB:UWN_t4 --rg PL:ILLUMINA --rg PU:CE9PNANXX.8 --max-intronlen 2500 -x $RNA_REF_INDEX --dta --rna-strandness RF -1 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_1.fastq.gz -2 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_2.fastq.gz -S ./UWN_t4.sam
43288187 reads; of these:
  43288187 (100.00%) were paired; of these:
    20057896 (46.34%) aligned concordantly 0 times
    5856233 (13.53%) aligned concordantly exactly 1 time
    17374058 (40.14%) aligned concordantly >1 times
    ----
    20057896 pairs aligned concordantly 0 times; of these:
      3360282 (16.75%) aligned discordantly 1 time
    ----
    16697614 pairs aligned 0 times concordantly or discordantly; of these:
      33395228 mates make up the pairs; of these:
        9894750 (29.63%) aligned 0 times
        725032 (2.17%) aligned exactly 1 time
        22775446 (68.20%) aligned >1 times
88.57% overall alignment rate

What I mainly notice that has changed is the number of aligned concordantly exactly 1 time category has dropped by ~20%, and moved to the aligned concordantly 0 times category, and further the aligned >1 times category within that.

My question is, I understand how reducing intron length would filter reads into the aligned concordantly 0 times category, but I don't understand how the majority of those are aligned >1 times, since they aligned 'exactly 1 time' prior to filtering.

Thanks in advance to anybody who can help!

RNA-Seq alignment • 740 views

ADD COMMENT • link 3.8 years ago by bertb ▴ 20