Evaluation of HISAT2 Alignment Result
2
1
Entering edit mode
4.6 years ago
modarzi ▴ 160

Hi, I run hisat2 for one of my sample(RNA-seq fastq) but I receive lots of Warning. You can see the result of this analysis:

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because length (1) <= # seed mismatches (0)

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because it was < 2 characters long

46943435 (100.00%) were unpaired; of these:

9396529 (20.02%) aligned 0 times

12967404 (27.62%) aligned exactly 1 time

24579502 (52.36%) aligned >1 times


79.98% overall alignment rate.

I don't have any interpretation about this result.I want to know this result is good or not. I use hg38_tran for indexing.Is this result will be change If I use hg19 as reference indexing? and my second problem is that I don't know this sample belong to the which area of Genome. I appreciate if you share your comment with me.

Best Regards,

RNA-Seq alignment HISAT2 • 7.8k views
1
Entering edit mode
 79.98% overall alignment rate.


This means the total mapping (alignment) ratio of RNA reads to the genome that you used. This is not bad result, but it can improve. You can use the latest genome. The alignment ratio depends on several things;

1. Quality of RNA reads (in terms of contamination from other organisms, such as bacteria and viruses, low quality bases)
2. strand type for RNA sequencing: First stranded, second stranded etc.
3. Contamination of RNA reads with other RNA types, such as rRNA. If rRNA is in your mRNA, this can cause low alignment ratio.

I guess you can use samtools to figure out genomic coordination of RNA reads, or you ca use one of visualization tools to see where these RNA reads mapped in the genome.

0
Entering edit mode

Thanks.based on your comment H have some questions: 1- I use GSE58708 dataset. I have to say that before alignment, I use FastQC software for quality control. how can I understand quality of read? 2-My data is Single-End.how can I find type for RNA sequencing strand? 3-how can I find that i have rRNA in my reads?

0
Entering edit mode

Based on FastQC report, for instance, you can remove short reads.For this you can use trimmomatic. Please check options of trimmomatic to see what you can do. you can also remove adaptor sequences if you have used illumina. For sequencing strand, you should check strand type if you have downloaded this data from a database or to someone who performed sequencing. For rRNA check, you can do blast remotely, meaning searching your RNA reads to NCBI human RNA database using -remote option. Please have a look blast manual to search remotely and to search against to only human RNA database by specifying organism in your blast command.

finally, you should check number of reads in your fastq file and compare hisat2 results below:

    46943435 reads; of these:

46943435 (100.00%) were unpaired; of these:

17
Entering edit mode
4.6 years ago

I will provide example from my own dataset using latest version of HISAT2

HISAT2 summary stats:
Total pairs: 11587225
Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
Aligned concordantly 1 time: 2195620 (18.95%)
Aligned concordantly >1 times: 4877336 (42.09%)
Aligned discordantly 1 time: 50186 (0.43%)
Aligned 0 time: 8019048 (89.82%)
Aligned 1 time: 304653 (3.41%)
Aligned >1 times: 604465 (6.77%)
Overall alignment rate: 65.40%


## Description

1. Total pairs: 11587225

Total reads = 11587225 * 2 = 23174450 (matches total number of reads in the sample)

2. Aligned concordantly or discordantly 0 time: 4464083 (38.53%)

These are unmapped reads : 4464083 * 2 (paired end) = 8928166

 ( 8928166 /  23174450 (Total reads) ) * 100 ~ 38.53%


3. Aligned concordantly 1 time: 2195620 (18.95%)

These are uniquely mapped reads : 2195620 * 2 (paired end) = 4391240

( 4391240 /  23174450 (Total reads) ) * 100 ~ 18.95%


4. Aligned concordantly >1 times: 4877336 (42.09%)

These are multi mapped reads : 4877336 * 2 = 9754672

( 9754672 /  23174450 (Total reads) ) * 100 ~ 42.09%


5.Aligned discordantly 1 time: 50186 (0.43%)

Discordant aligned : 50186 * 2 = 100372

( 100372 /  23174450 (Total reads) ) * 100 ~ 0.43%


• Aligned 0 time: 8019048 (89.82%)

(8019048 / 8928166 ) * 100 = 89.82% i.e. 89% of the unpaired reads did not align at all

• Aligned 1 time: 304653 (3.41%)

(304653 / 8928166 ) * 100 = 3.41% i.e. 3.41% of the unpaired reads aligned once

• Aligned >1 times: 604465 (6.77%)

(604465 / 8928166 ) * 100 = 6.77% i.e. 6.77% of the unpaired reads are multi mapped

7. Overall alignment rate: 65.40%

Calculation as explained below

Aligned concordantly 1 time: (2195620 * 2 = 4391240)
Aligned concordantly >1 times: (4877336 *2  = 9754672)
Aligned discordantly 1 time: (50186 * 2 = 100372)


Aligned 1 time: 304653
Aligned >1 times: 604465


Total = 4391240 + 9754672 +  100372 +  304653 +  604465 = 15155402

Overall Alignment Rate = (15155402 / 23174450) * 100 = 65.40%

1
Entering edit mode

Why do you have such a high % of unpaired reads?

1
Entering edit mode
4.6 years ago

Half your reads aligning more than once? I'm not sure that's normal. I do RNAseq on human and mouse samples, and I expect about 70-80% reads aligning uniquely.

0
Entering edit mode

Yes.for this sample I got this result.but for another samples this rate is less than 50%. for example for 3 0thers sample, I got below results:

Result for  sra_data_SRR1427483.fastq

46870160 (100.00%) were unpaired; of these:
10223260 (21.81%) aligned 0 times
22323761 (47.63%) aligned exactly 1 time
14323139 (30.56%) aligned >1 times
78.19% overall alignment rate

############################################

Result for  sra_data_SRR1427484.fastq

48061213 (100.00%) were unpaired; of these:
9151179 (19.04%) aligned 0 times
28899623 (60.13%) aligned exactly 1 time
10010411 (20.83%) aligned >1 times
80.96% overall alignment rate

############################################

Result for  sra_data_SRR1427485.fastq
47620786 (100.00%) were unpaired; of these:
9265819 (19.46%) aligned 0 times
22099556 (46.41%) aligned exactly 1 time
16255411 (34.14%) aligned >1 times
80.54% overall alignment rate


I appreciate if you share your comment with me. Best Regards