reads processed: 5009934

Question

Low alignment score for microRNA

0

Entering edit mode

18 months ago

ntsopoul ▴ 60

Hi,

I am trying to align reads to miRbase however, I get very low alignment scores and I do not know why. I read in other entries that adapter contamination could be a problem when using bowtie(1) but I checked my fastq files and I get bona fide miRnas when I search reads the miRbase.

The library was generated with the NEBNext Small RNA library. On the website I see that it is recommended to use "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" for trimming, so I did

###Trimming###

#!/bin/bash

for i in $(ls *R1* | sed s/_R1.fastq.gz// | sort -u); 
do

trim_galore ${i}_R1.fastq.gz -q 20 --fastqc --cores 3 --stringency 4 --length 18 --max_length 31 --adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -o /any/directory/

done

The fastqc output looks a bit messy but I think it is expected due to the low complexity of small RNAs, nevertheless here some screenshots from a representative example:

`enter image description here`

Here is the summary outbput from trimGalore!

=== Summary ===

Total reads processed: 12,028,354 Reads with adapters:
6,119,061 (50.9%) Reads written (passing filters): 12,028,354 (100.0%)

Total basepairs processed: 613,446,054 bp Quality-trimmed:
806,268 bp (0.1%) Total written (filtered): 440,154,599 bp (71.8%)

I then checked the reads quickly and aligned some to mirBase:

@VH00411:17:AAALWCMHV:1:1101:55440:1019 1:N:0:AGTCAA TTTGGCAATGGTAGAACTCACACCG + CCCCCCCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:60363:1019 1:N:0:AGTCAA TTCACAGTGGCTAAGTTCTGC + ;CCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:45271:1038 1:N:0:AGTCAA TAGCTTATCAGACTGATGTTGA

and I get a perfect match:

enter image description here

I downloaded the latest miRbase (22.1) files mature.fa and I generated an index with the mature.fa file Feeling confident about the results so far, I pursue the alignment with bowtie(1)

 #!/bin/bash
basename=mature_bowtie

for i in $(ls any/directory/*.fq  | xargs -n 1 basename | sort -u); 


do

bowtie -v 2 -a --best --strata -x mature_bowtie any/directory/${i} -S any/directory/${i}.sam

done

And then I get disappointed by seeing the following output:

reads processed: 5009934

reads with at least one alignment: 62 (0.00%)

reads that failed to align: 5009872 (100.00%) Reported 213 alignments

What am I doing wrong? Do I have to convert T to U?

small-RNA microRNA bowtie alignment • 1.4k views

ADD COMMENT • link updated 18 months ago by GenoMax 141k • written 18 months ago by ntsopoul ▴ 60

score 3 · Accepted Answer · 2022-09-27

3

Entering edit mode

18 months ago

GenoMax 141k

Do I have to convert T to U?

Yes you have to do that conversion. Then align with a bowtie v.1.x index created after the conversion.

ADD COMMENT • link 18 months ago by GenoMax 141k

0

Entering edit mode

Oh thank you so much! This makes total sense…

So how dow I do the conversion and on which file? The mature.fa I guess …

ADD REPLY • link 18 months ago by ntsopoul ▴ 60

0

Entering edit mode

You could use sed 's/U/T/g' mature.fa > conv.fa as a quick brute force method. It may replace some u characters in fasta headers.

ADD REPLY • link 18 months ago by GenoMax 141k

1

Entering edit mode

Just for the record. I did the conversion with

sed '/^[^>]/s/U/T/g' mature.fa > mature_conv.fa

to avoid modification of the header and it worked perfectly.

From:

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA

To:

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
TGAGGTAGTAGGTTGTATAGTT
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CTATGCAATTTTCTACCTTACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
TCCCTGAGACCTCAAGTGTGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCTGGGCTCTCCGGGTACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CATACTTCCTTACATGCCCATA

And the alignment seems to work as well! Thanks GenoMax

ADD REPLY • link 18 months ago by ntsopoul ▴ 60

0

Entering edit mode

ok great I will try this. Maybe not in line with the thread but which aligner would you recommend to align the small-rna seq onto the genome? I just want to do differential miRNA expression between two conditions, nothing fancy. Is it in this case an advantage to align to the genome? I am worried that by aligning to mirBase the normalization will be tricky.

THANKS!

ADD REPLY • link 18 months ago by ntsopoul ▴ 60

0

Entering edit mode

Since miRNA are small you will want to stick with an un-gapped aligner so bowtie v.1.x is a fine choice.

ADD REPLY • link 18 months ago by GenoMax 141k

0

Entering edit mode

thanks, but what is the advantage of alignment to the mouse genome compared to alignment to mirbase? Is there any?

ADD REPLY • link 18 months ago by ntsopoul ▴ 60

0

Entering edit mode

miRNA are small and you are likely to get a lot of multi-mapping. So you would need to account for that. If you are looking for novel species of miRNA then you may have to go the genome way.

ADD REPLY • link 18 months ago by GenoMax 141k