Hello,
I prepared .sam files from PE sequencing results as follows:
hisat2 -p 8 --rg-id=UWN_t3 --rg SM:UWN_t3 --rg LB:UWN_t3 --rg PL:ILLUMINA --rg PU:CE9PNANXX.8 -x $RNA_REF_INDEX --dta --rna-strandness RF -1 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_1.fastq.gz -2 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_2.fastq.gz -S ./UWN_t3.sam
#output
43288187 reads; of these:
43288187 (100.00%) were paired; of these:
9023789 (20.85%) aligned concordantly 0 times
16704076 (38.59%) aligned concordantly exactly 1 time
17560322 (40.57%) aligned concordantly >1 times
----
9023789 pairs aligned concordantly 0 times; of these:
3361606 (37.25%) aligned discordantly 1 time
----
5662183 pairs aligned 0 times concordantly or discordantly; of these:
11324366 mates make up the pairs; of these:
9894957 (87.38%) aligned 0 times
731747 (6.46%) aligned exactly 1 time
697662 (6.16%) aligned >1 times
88.57% overall alignment rate
At which point, I remembered that my organism (yeast) does not have any introns larger than 2500bp, so I added the option --max-intronlen 2500
to the command, and got the following output:
hisat2 -p 8 --rg-id=UWN_t4 --rg SM:UWN_t4 --rg LB:UWN_t4 --rg PL:ILLUMINA --rg PU:CE9PNANXX.8 --max-intronlen 2500 -x $RNA_REF_INDEX --dta --rna-strandness RF -1 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_1.fastq.gz -2 $RNA_DATA_DIR/trimmed/UW_N_Mix.trimmed_2.fastq.gz -S ./UWN_t4.sam
43288187 reads; of these:
43288187 (100.00%) were paired; of these:
20057896 (46.34%) aligned concordantly 0 times
5856233 (13.53%) aligned concordantly exactly 1 time
17374058 (40.14%) aligned concordantly >1 times
----
20057896 pairs aligned concordantly 0 times; of these:
3360282 (16.75%) aligned discordantly 1 time
----
16697614 pairs aligned 0 times concordantly or discordantly; of these:
33395228 mates make up the pairs; of these:
9894750 (29.63%) aligned 0 times
725032 (2.17%) aligned exactly 1 time
22775446 (68.20%) aligned >1 times
88.57% overall alignment rate
What I mainly notice that has changed is the number of aligned concordantly exactly 1 time
category has dropped by ~20%, and moved to the aligned concordantly 0 times category
, and further the aligned >1 times
category within that.
My question is, I understand how reducing intron length would filter reads into the aligned concordantly 0 times category
, but I don't understand how the majority of those are aligned >1 times
, since they aligned 'exactly 1 time' prior to filtering.
Thanks in advance to anybody who can help!