Low alignment score for microRNA
1
0
Entering edit mode
18 months ago
ntsopoul ▴ 60

Hi,

I am trying to align reads to miRbase however, I get very low alignment scores and I do not know why. I read in other entries that adapter contamination could be a problem when using bowtie(1) but I checked my fastq files and I get bona fide miRnas when I search reads the miRbase.

The library was generated with the NEBNext Small RNA library. On the website I see that it is recommended to use "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" for trimming, so I did

###Trimming###

#!/bin/bash

for i in $(ls *R1* | sed s/_R1.fastq.gz// | sort -u); 
do

trim_galore ${i}_R1.fastq.gz -q 20 --fastqc --cores 3 --stringency 4 --length 18 --max_length 31 --adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -o /any/directory/

done

The fastqc output looks a bit messy but I think it is expected due to the low complexity of small RNAs, nevertheless here some screenshots from a representative example:

`enter image description here` enter image description here

Here is the summary outbput from trimGalore!

=== Summary ===

Total reads processed: 12,028,354 Reads with adapters:
6,119,061 (50.9%) Reads written (passing filters): 12,028,354 (100.0%)

Total basepairs processed: 613,446,054 bp Quality-trimmed:
806,268 bp (0.1%) Total written (filtered): 440,154,599 bp (71.8%)

I then checked the reads quickly and aligned some to mirBase:

@VH00411:17:AAALWCMHV:1:1101:55440:1019 1:N:0:AGTCAA TTTGGCAATGGTAGAACTCACACCG + CCCCCCCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:60363:1019 1:N:0:AGTCAA TTCACAGTGGCTAAGTTCTGC + ;CCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:45271:1038 1:N:0:AGTCAA TAGCTTATCAGACTGATGTTGA

and I get a perfect match:

enter image description here

I downloaded the latest miRbase (22.1) files mature.fa and I generated an index with the mature.fa file Feeling confident about the results so far, I pursue the alignment with bowtie(1)

 #!/bin/bash
basename=mature_bowtie

for i in $(ls any/directory/*.fq  | xargs -n 1 basename | sort -u); 


do

bowtie -v 2 -a --best --strata -x mature_bowtie any/directory/${i} -S any/directory/${i}.sam

done

And then I get disappointed by seeing the following output:

reads processed: 5009934

reads with at least one alignment: 62 (0.00%)

reads that failed to align: 5009872 (100.00%) Reported 213 alignments

What am I doing wrong? Do I have to convert T to U?

small-RNA microRNA bowtie alignment • 1.4k views
ADD COMMENT
3
Entering edit mode
18 months ago
GenoMax 141k

Do I have to convert T to U?

Yes you have to do that conversion. Then align with a bowtie v.1.x index created after the conversion.

ADD COMMENT
0
Entering edit mode

Oh thank you so much! This makes total sense…

So how dow I do the conversion and on which file? The mature.fa I guess …

ADD REPLY
0
Entering edit mode

You could use sed 's/U/T/g' mature.fa > conv.fa as a quick brute force method. It may replace some u characters in fasta headers.

ADD REPLY
1
Entering edit mode

Just for the record. I did the conversion with

sed '/^[^>]/s/U/T/g' mature.fa > mature_conv.fa

to avoid modification of the header and it worked perfectly.

From:

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA

To:

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
TGAGGTAGTAGGTTGTATAGTT
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CTATGCAATTTTCTACCTTACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
TCCCTGAGACCTCAAGTGTGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCTGGGCTCTCCGGGTACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CATACTTCCTTACATGCCCATA

And the alignment seems to work as well! Thanks GenoMax

ADD REPLY
0
Entering edit mode

ok great I will try this. Maybe not in line with the thread but which aligner would you recommend to align the small-rna seq onto the genome? I just want to do differential miRNA expression between two conditions, nothing fancy. Is it in this case an advantage to align to the genome? I am worried that by aligning to mirBase the normalization will be tricky.

THANKS!

ADD REPLY
0
Entering edit mode

Since miRNA are small you will want to stick with an un-gapped aligner so bowtie v.1.x is a fine choice.

ADD REPLY
0
Entering edit mode

thanks, but what is the advantage of alignment to the mouse genome compared to alignment to mirbase? Is there any?

ADD REPLY
0
Entering edit mode

miRNA are small and you are likely to get a lot of multi-mapping. So you would need to account for that. If you are looking for novel species of miRNA then you may have to go the genome way.

ADD REPLY

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6