Question: Low mapping rate with splice aware aligners (kallisto, HISAT2) but not bowtie2
0
gravatar for a.palmer
7 months ago by
a.palmer20
a.palmer20 wrote:

Hi there,

Recently I've been processing paired-end mRNA-seq data from an experiment in C. elegans.

When I've tried aligning my reads to the reference transcriptome, both kallisto and HISAT2 return extremely low alignment rates (~0.3%).

kallisto code:

kallisto index -i transcriptome.idx Caenorhabditis_elegans.WBcel235.cdna.all.fa

kallisto quant -i transcriptome.idx -o out/S996-1 data/S996-1-R1.fastq data/S996-1-R2.fastq --fr-stranded

HISAT2 code:

hisat2-build Caenorhabditis_elegans.WBcel235.cdna.all.fa hisat_index

hisat2 -x hisat_index -1 S996-1-R1.fastq -2 S996-1-R2.fastq -S S996-1.sam

Strangely though, when using bowtie2 I achieved ~85% alignment.

bowtie2 code

bowtie2-build Caenorhabditis_elegans.WBcel235.cdna.all.fa bowtie_index

bowtie2 -x bowtie_index -1 S996-1-R1.fastq -2 S996-1-R2.fastq -S S996-1.sam

I don't understand what is causing this difference - my read quality is normal and I'm using default parameters.

Any help would be greatly appreciated!

Alex

hisat2 rna-seq kallisto bowtie2 • 436 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by a.palmer20
2

Please add code. Anecdotal error descriptions are hard to debug. cDNA reference genome is unclear, there is a reference genome and a reference transcriptome. cDNA and genome are mutually exclusive. Please add all command lines including how indices were creates.

ADD REPLYlink written 7 months ago by ATpoint42k
1

Hi, I've amended my post - hopefully it makes things clearer

ADD REPLYlink modified 7 months ago • written 7 months ago by a.palmer20

Thanks, can you also post the alignment summary of hisat and bowtie (the one that is printed to screen when the alignment is finished where it tells how many reads mapped, concordantly, disconcordantly etc...)? Did you manipulate the fastq files before alignment (trimming, reordering, things like that)?

ADD REPLYlink modified 7 months ago • written 7 months ago by ATpoint42k

Sorry for the long response, the only modification to my files was to remove the sequencing adapters specific to each run (multiplexed sequencing)


Bowtie2 output:

24626899 reads; of these:
24597942 (99.88%) were paired; of these:
5060101 (20.57%) aligned concordantly 0 times
18743985 (76.20%) aligned concordantly exactly 1 time
793856 (3.23%) aligned concordantly >1 times
----
5060101 pairs aligned concordantly 0 times; of these:
452691 (8.95%) aligned discordantly 1 time
----
4607410 pairs aligned 0 times concordantly or discordantly; of these:
9214820 mates make up the pairs; of these:
7479978 (81.17%) aligned 0 times
1557955 (16.91%) aligned exactly 1 time
176887 (1.92%) aligned >1 times
28957 (0.12%) were unpaired; of these:
28919 (99.87%) aligned 0 times
35 (0.12%) aligned exactly 1 time
3 (0.01%) aligned >1 times
84.75% overall alignment rate


HISAT2 output:

24626899 reads; of these:
24626899 (100.00%) were paired; of these:
24565513 (99.75%) aligned concordantly 0 times
34258 (0.14%) aligned concordantly exactly 1 time
27128 (0.11%) aligned concordantly >1 times
----
24565513 pairs aligned concordantly 0 times; of these:
496 (0.00%) aligned discordantly 1 time
----
24565017 pairs aligned 0 times concordantly or discordantly; of these:
49130034 mates make up the pairs; of these:
49104613 (99.95%) aligned 0 times
13478 (0.03%) aligned exactly 1 time
11943 (0.02%) aligned >1 times
0.30% overall alignment rate

ADD REPLYlink modified 7 months ago • written 7 months ago by a.palmer20
0
gravatar for a.palmer
7 months ago by
a.palmer20
a.palmer20 wrote:

I solved the problem - it turns out that splice-aware aligners such as kallisto and HISAT2 require a genomic index, rather than one based entirely off the transcriptome. After I changed this, my overall alignment rate increased from 0.30% to ~75% using HISAT2.

ADD COMMENTlink written 7 months ago by a.palmer20
3

Kallisto in fact needs the transcriptome, not genome. I guess that using transcriptome reference but genome splice sites caused the trouble in hisat2, still kallisto should've been fine with the transcriptmome.

ADD REPLYlink modified 7 months ago • written 7 months ago by ATpoint42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1058 users visited in the last hour