Low alignment score for microRNA
1
0
Entering edit mode
8 weeks ago
ntsopoul ▴ 10

Hi,

I am trying to align reads to miRbase however, I get very low alignment scores and I do not know why. I read in other entries that adapter contamination could be a problem when using bowtie(1) but I checked my fastq files and I get bona fide miRnas when I search reads the miRbase.

The library was generated with the NEBNext Small RNA library. On the website I see that it is recommended to use "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" for trimming, so I did

###Trimming###

#!/bin/bash

for i in $(ls *R1* | sed s/_R1.fastq.gz// | sort -u); do trim_galore${i}_R1.fastq.gz -q 20 --fastqc --cores 3 --stringency 4 --length 18 --max_length 31 --adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -o /any/directory/

done


The fastqc output looks a bit messy but I think it is expected due to the low complexity of small RNAs, nevertheless here some screenshots from a representative example:

Here is the summary outbput from trimGalore!

=== Summary ===

6,119,061 (50.9%) Reads written (passing filters): 12,028,354 (100.0%)

Total basepairs processed: 613,446,054 bp Quality-trimmed:
806,268 bp (0.1%) Total written (filtered): 440,154,599 bp (71.8%)

I then checked the reads quickly and aligned some to mirBase:

@VH00411:17:AAALWCMHV:1:1101:55440:1019 1:N:0:AGTCAA TTTGGCAATGGTAGAACTCACACCG + CCCCCCCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:60363:1019 1:N:0:AGTCAA TTCACAGTGGCTAAGTTCTGC + ;CCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:45271:1038 1:N:0:AGTCAA TAGCTTATCAGACTGATGTTGA

and I get a perfect match:

I downloaded the latest miRbase (22.1) files mature.fa and I generated an index with the mature.fa file Feeling confident about the results so far, I pursue the alignment with bowtie(1)

 #!/bin/bash
basename=mature_bowtie

for i in $(ls any/directory/*.fq | xargs -n 1 basename | sort -u); do bowtie -v 2 -a --best --strata -x mature_bowtie any/directory/${i} -S any/directory/\${i}.sam

done


And then I get disappointed by seeing the following output:

# reads that failed to align: 5009872 (100.00%) Reported 213 alignments

What am I doing wrong? Do I have to convert T to U?

small-RNA microRNA bowtie alignment • 384 views
3
Entering edit mode
8 weeks ago
GenoMax 123k

Do I have to convert T to U?

Yes you have to do that conversion. Then align with a bowtie v.1.x index created after the conversion.

0
Entering edit mode

Oh thank you so much! This makes total sense…

So how dow I do the conversion and on which file? The mature.fa I guess …

0
Entering edit mode

You could use sed 's/U/T/g' mature.fa > conv.fa as a quick brute force method. It may replace some u characters in fasta headers.

1
Entering edit mode

Just for the record. I did the conversion with

sed '/^[^>]/s/U/T/g' mature.fa > mature_conv.fa


to avoid modification of the header and it worked perfectly.

From:

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA


To:

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
TGAGGTAGTAGGTTGTATAGTT
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CTATGCAATTTTCTACCTTACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
TCCCTGAGACCTCAAGTGTGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCTGGGCTCTCCGGGTACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CATACTTCCTTACATGCCCATA


And the alignment seems to work as well! Thanks GenoMax

0
Entering edit mode

ok great I will try this. Maybe not in line with the thread but which aligner would you recommend to align the small-rna seq onto the genome? I just want to do differential miRNA expression between two conditions, nothing fancy. Is it in this case an advantage to align to the genome? I am worried that by aligning to mirBase the normalization will be tricky.

THANKS!

0
Entering edit mode

Since miRNA are small you will want to stick with an un-gapped aligner so bowtie v.1.x is a fine choice.

0
Entering edit mode

thanks, but what is the advantage of alignment to the mouse genome compared to alignment to mirbase? Is there any?

0
Entering edit mode

miRNA are small and you are likely to get a lot of multi-mapping. So you would need to account for that. If you are looking for novel species of miRNA then you may have to go the genome way.