Question

Evaluation of HISAT2 Alignment Result

1

Entering edit mode

6.0 years ago

modarzi ▴ 170

Hi, I run hisat2 for one of my sample(RNA-seq fastq) but I receive lots of Warning. You can see the result of this analysis:

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because length (1) <= # seed mismatches (0)

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because it was < 2 characters long

46943435 reads; of these:

46943435 (100.00%) were unpaired; of these:

9396529 (20.02%) aligned 0 times

12967404 (27.62%) aligned exactly 1 time

24579502 (52.36%) aligned >1 times

79.98% overall alignment rate.

I don't have any interpretation about this result.I want to know this result is good or not. I use hg38_tran for indexing.Is this result will be change If I use hg19 as reference indexing? and my second problem is that I don't know this sample belong to the which area of Genome. I appreciate if you share your comment with me.

Best Regards,

Mohammad

RNA-Seq alignment HISAT2 • 10k views

ADD COMMENT • link updated 6.0 years ago by lakhujanivijay 5.8k • written 6.0 years ago by modarzi ▴ 170

1

Entering edit mode

 79.98% overall alignment rate.

This means the total mapping (alignment) ratio of RNA reads to the genome that you used. This is not bad result, but it can improve. You can use the latest genome. The alignment ratio depends on several things;

Quality of RNA reads (in terms of contamination from other organisms, such as bacteria and viruses, low quality bases)
strand type for RNA sequencing: First stranded, second stranded etc.
Contamination of RNA reads with other RNA types, such as rRNA. If rRNA is in your mRNA, this can cause low alignment ratio.

I guess you can use samtools to figure out genomic coordination of RNA reads, or you ca use one of visualization tools to see where these RNA reads mapped in the genome.

ADD REPLY • link 6.0 years ago by Mehmet ▴ 820

0

Entering edit mode

Thanks.based on your comment H have some questions: 1- I use GSE58708 dataset. I have to say that before alignment, I use FastQC software for quality control. how can I understand quality of read? 2-My data is Single-End.how can I find type for RNA sequencing strand? 3-how can I find that i have rRNA in my reads?

ADD REPLY • link 6.0 years ago by modarzi ▴ 170

0

Entering edit mode

Based on FastQC report, for instance, you can remove short reads.For this you can use trimmomatic. Please check options of trimmomatic to see what you can do. you can also remove adaptor sequences if you have used illumina. For sequencing strand, you should check strand type if you have downloaded this data from a database or to someone who performed sequencing. For rRNA check, you can do blast remotely, meaning searching your RNA reads to NCBI human RNA database using -remote option. Please have a look blast manual to search remotely and to search against to only human RNA database by specifying organism in your blast command.

finally, you should check number of reads in your fastq file and compare hisat2 results below:

    46943435 reads; of these:

46943435 (100.00%) were unpaired; of these:

ADD REPLY • link 6.0 years ago by Mehmet ▴ 820

score 18 · Answer 1 · 2018-05-05

I will provide example from my own dataset using latest version of HISAT2

HISAT2 summary stats:
        Total pairs: 11587225
                Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
                Aligned concordantly 1 time: 2195620 (18.95%)
                Aligned concordantly >1 times: 4877336 (42.09%)
                Aligned discordantly 1 time: 50186 (0.43%)
        Total unpaired reads: 8928166
                Aligned 0 time: 8019048 (89.82%)
                Aligned 1 time: 304653 (3.41%)
                Aligned >1 times: 604465 (6.77%)
        Overall alignment rate: 65.40%

Description

1. Total pairs: 11587225

Total reads = 11587225 * 2 = 23174450 (matches total number of reads in the sample)

2. Aligned concordantly or discordantly 0 time: 4464083 (38.53%)

These are unmapped reads : 4464083 * 2 (paired end) = 8928166

 ( 8928166 /  23174450 (Total reads) ) * 100 ~ 38.53%

3. Aligned concordantly 1 time: 2195620 (18.95%)

These are uniquely mapped reads : 2195620 * 2 (paired end) = 4391240

( 4391240 /  23174450 (Total reads) ) * 100 ~ 18.95%

4. Aligned concordantly >1 times: 4877336 (42.09%)

These are multi mapped reads : 4877336 * 2 = 9754672

( 9754672 /  23174450 (Total reads) ) * 100 ~ 42.09%

5.Aligned discordantly 1 time: 50186 (0.43%)

Discordant aligned : 50186 * 2 = 100372

( 100372 /  23174450 (Total reads) ) * 100 ~ 0.43%

6. Total unpaired reads: 8928166

These are not paired reads

Aligned 0 time: 8019048 (89.82%)

(8019048 / 8928166 ) * 100 = 89.82% i.e. 89% of the unpaired reads did not align at all
Aligned 1 time: 304653 (3.41%)

(304653 / 8928166 ) * 100 = 3.41% i.e. 3.41% of the unpaired reads aligned once
Aligned >1 times: 604465 (6.77%)

(604465 / 8928166 ) * 100 = 6.77% i.e. 6.77% of the unpaired reads are multi mapped

7. Overall alignment rate: 65.40%

Calculation as explained below

PAIRED READS

Aligned concordantly 1 time: (2195620 * 2 = 4391240)
Aligned concordantly >1 times: (4877336 *2  = 9754672)
Aligned discordantly 1 time: (50186 * 2 = 100372)

UNPAIRED READS

Aligned 1 time: 304653 
Aligned >1 times: 604465

Total = 4391240 + 9754672 +  100372 +  304653 +  604465 = 15155402

Overall Alignment Rate = (15155402 / 23174450) * 100 = 65.40%

GenoMax · Answer 2 · 2018-05-04

1

Entering edit mode

6.0 years ago

swbarnes2 14k

Half your reads aligning more than once? I'm not sure that's normal. I do RNAseq on human and mouse samples, and I expect about 70-80% reads aligning uniquely.

ADD COMMENT • link 6.0 years ago by swbarnes2 14k

0

Entering edit mode

Yes.for this sample I got this result.but for another samples this rate is less than 50%. for example for 3 0thers sample, I got below results:

Result for  sra_data_SRR1427483.fastq

46870160 reads; of these:
  46870160 (100.00%) were unpaired; of these:
    10223260 (21.81%) aligned 0 times
    22323761 (47.63%) aligned exactly 1 time
    14323139 (30.56%) aligned >1 times
78.19% overall alignment rate

############################################


Result for  sra_data_SRR1427484.fastq

  48061213 (100.00%) were unpaired; of these:
    9151179 (19.04%) aligned 0 times
    28899623 (60.13%) aligned exactly 1 time
    10010411 (20.83%) aligned >1 times
80.96% overall alignment rate

############################################

Result for  sra_data_SRR1427485.fastq
47620786 reads; of these:
  47620786 (100.00%) were unpaired; of these:
    9265819 (19.46%) aligned 0 times
    22099556 (46.41%) aligned exactly 1 time
    16255411 (34.14%) aligned >1 times
80.54% overall alignment rate

I appreciate if you share your comment with me. Best Regards

ADD REPLY • link updated 6.0 years ago by GenoMax 141k • written 6.0 years ago by modarzi ▴ 170