Question: Evaluation of HISAT2 Alignment Result
0
gravatar for modarzi
2.2 years ago by
modarzi120
modarzi120 wrote:

Hi, I run hisat2 for one of my sample(RNA-seq fastq) but I receive lots of Warning. You can see the result of this analysis:

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because length (1) <= # seed mismatches (0)

Warning: skipping read 'SRR1427482.47940377.1 FCD0EFCABXX:5:1208:6170:183143 length=49' because it was < 2 characters long

46943435 reads; of these:

46943435 (100.00%) were unpaired; of these:

9396529 (20.02%) aligned 0 times

12967404 (27.62%) aligned exactly 1 time

24579502 (52.36%) aligned >1 times

79.98% overall alignment rate.

I don't have any interpretation about this result.I want to know this result is good or not. I use hg38_tran for indexing.Is this result will be change If I use hg19 as reference indexing? and my second problem is that I don't know this sample belong to the which area of Genome. I appreciate if you share your comment with me.

Best Regards,

Mohammad

hisat2 rna-seq alignment • 3.8k views
ADD COMMENTlink modified 2.2 years ago by lakhujanivijay5.0k • written 2.2 years ago by modarzi120
1
 79.98% overall alignment rate.

This means the total mapping (alignment) ratio of RNA reads to the genome that you used. This is not bad result, but it can improve. You can use the latest genome. The alignment ratio depends on several things;

  1. Quality of RNA reads (in terms of contamination from other organisms, such as bacteria and viruses, low quality bases)
  2. strand type for RNA sequencing: First stranded, second stranded etc.
  3. Contamination of RNA reads with other RNA types, such as rRNA. If rRNA is in your mRNA, this can cause low alignment ratio.

I guess you can use samtools to figure out genomic coordination of RNA reads, or you ca use one of visualization tools to see where these RNA reads mapped in the genome.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Mehmet540

Thanks.based on your comment H have some questions: 1- I use GSE58708 dataset. I have to say that before alignment, I use FastQC software for quality control. how can I understand quality of read? 2-My data is Single-End.how can I find type for RNA sequencing strand? 3-how can I find that i have rRNA in my reads?

ADD REPLYlink written 2.2 years ago by modarzi120

Based on FastQC report, for instance, you can remove short reads.For this you can use trimmomatic. Please check options of trimmomatic to see what you can do. you can also remove adaptor sequences if you have used illumina. For sequencing strand, you should check strand type if you have downloaded this data from a database or to someone who performed sequencing. For rRNA check, you can do blast remotely, meaning searching your RNA reads to NCBI human RNA database using -remote option. Please have a look blast manual to search remotely and to search against to only human RNA database by specifying organism in your blast command.

finally, you should check number of reads in your fastq file and compare hisat2 results below:

    46943435 reads; of these:

46943435 (100.00%) were unpaired; of these:
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Mehmet540
11
gravatar for lakhujanivijay
2.2 years ago by
lakhujanivijay5.0k
India
lakhujanivijay5.0k wrote:

I will provide example from my own dataset using latest version of HISAT2

HISAT2 summary stats:
        Total pairs: 11587225
                Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
                Aligned concordantly 1 time: 2195620 (18.95%)
                Aligned concordantly >1 times: 4877336 (42.09%)
                Aligned discordantly 1 time: 50186 (0.43%)
        Total unpaired reads: 8928166
                Aligned 0 time: 8019048 (89.82%)
                Aligned 1 time: 304653 (3.41%)
                Aligned >1 times: 604465 (6.77%)
        Overall alignment rate: 65.40%

Description

1. Total pairs: 11587225

Total reads = 11587225 * 2 = 23174450 (matches total number of reads in the sample)

2. Aligned concordantly or discordantly 0 time: 4464083 (38.53%)

These are unmapped reads : 4464083 * 2 (paired end) = 8928166

 ( 8928166 /  23174450 (Total reads) ) * 100 ~ 38.53%

3. Aligned concordantly 1 time: 2195620 (18.95%)

These are uniquely mapped reads : 2195620 * 2 (paired end) = 4391240

( 4391240 /  23174450 (Total reads) ) * 100 ~ 18.95%

4. Aligned concordantly >1 times: 4877336 (42.09%)

These are multi mapped reads : 4877336 * 2 = 9754672

( 9754672 /  23174450 (Total reads) ) * 100 ~ 42.09%

5.Aligned discordantly 1 time: 50186 (0.43%)

Discordant aligned : 50186 * 2 = 100372

( 100372 /  23174450 (Total reads) ) * 100 ~ 0.43%

6. Total unpaired reads: 8928166

These are not paired reads

  • Aligned 0 time: 8019048 (89.82%)

    (8019048 / 8928166 ) * 100 = 89.82% i.e. 89% of the unpaired reads did not align at all

  • Aligned 1 time: 304653 (3.41%)

    (304653 / 8928166 ) * 100 = 3.41% i.e. 3.41% of the unpaired reads aligned once

  • Aligned >1 times: 604465 (6.77%)

    (604465 / 8928166 ) * 100 = 6.77% i.e. 6.77% of the unpaired reads are multi mapped

7. Overall alignment rate: 65.40%

Calculation as explained below

PAIRED READS

Aligned concordantly 1 time: (2195620 * 2 = 4391240)
Aligned concordantly >1 times: (4877336 *2  = 9754672)
Aligned discordantly 1 time: (50186 * 2 = 100372)

UNPAIRED READS

Aligned 1 time: 304653 
Aligned >1 times: 604465

Total = 4391240 + 9754672 +  100372 +  304653 +  604465 = 15155402

Overall Alignment Rate = (15155402 / 23174450) * 100 = 65.40%
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by lakhujanivijay5.0k

Why do you have such a high % of unpaired reads?

ADD REPLYlink written 2.2 years ago by genomax85k
1
gravatar for swbarnes2
2.2 years ago by
swbarnes27.8k
United States
swbarnes27.8k wrote:

Half your reads aligning more than once? I'm not sure that's normal. I do RNAseq on human and mouse samples, and I expect about 70-80% reads aligning uniquely.

ADD COMMENTlink written 2.2 years ago by swbarnes27.8k

Yes.for this sample I got this result.but for another samples this rate is less than 50%. for example for 3 0thers sample, I got below results:

Result for  sra_data_SRR1427483.fastq

46870160 reads; of these:
  46870160 (100.00%) were unpaired; of these:
    10223260 (21.81%) aligned 0 times
    22323761 (47.63%) aligned exactly 1 time
    14323139 (30.56%) aligned >1 times
78.19% overall alignment rate

############################################


Result for  sra_data_SRR1427484.fastq

  48061213 (100.00%) were unpaired; of these:
    9151179 (19.04%) aligned 0 times
    28899623 (60.13%) aligned exactly 1 time
    10010411 (20.83%) aligned >1 times
80.96% overall alignment rate

############################################

Result for  sra_data_SRR1427485.fastq
47620786 reads; of these:
  47620786 (100.00%) were unpaired; of these:
    9265819 (19.46%) aligned 0 times
    22099556 (46.41%) aligned exactly 1 time
    16255411 (34.14%) aligned >1 times
80.54% overall alignment rate

I appreciate if you share your comment with me. Best Regards

ADD REPLYlink modified 2.2 years ago by genomax85k • written 2.2 years ago by modarzi120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour