Hi,
I am trying to align reads to miRbase however, I get very low alignment scores and I do not know why. I read in other entries that adapter contamination could be a problem when using bowtie(1) but I checked my fastq files and I get bona fide miRnas when I search reads the miRbase.
The library was generated with the NEBNext Small RNA library. On the website I see that it is recommended to use "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" for trimming, so I did
###Trimming###
#!/bin/bash
for i in $(ls *R1* | sed s/_R1.fastq.gz// | sort -u);
do
trim_galore ${i}_R1.fastq.gz -q 20 --fastqc --cores 3 --stringency 4 --length 18 --max_length 31 --adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -o /any/directory/
done
The fastqc output looks a bit messy but I think it is expected due to the low complexity of small RNAs, nevertheless here some screenshots from a representative example:
Here is the summary outbput from trimGalore!
=== Summary ===
Total reads processed: 12,028,354 Reads with adapters:
6,119,061 (50.9%) Reads written (passing filters): 12,028,354 (100.0%)Total basepairs processed: 613,446,054 bp Quality-trimmed:
806,268 bp (0.1%) Total written (filtered): 440,154,599 bp (71.8%)
I then checked the reads quickly and aligned some to mirBase:
@VH00411:17:AAALWCMHV:1:1101:55440:1019 1:N:0:AGTCAA TTTGGCAATGGTAGAACTCACACCG + CCCCCCCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:60363:1019 1:N:0:AGTCAA TTCACAGTGGCTAAGTTCTGC + ;CCCCCCCCCCCCCCCCCCCC @VH00411:17:AAALWCMHV:1:1101:45271:1038 1:N:0:AGTCAA TAGCTTATCAGACTGATGTTGA
and I get a perfect match:
I downloaded the latest miRbase (22.1) files mature.fa and I generated an index with the mature.fa file Feeling confident about the results so far, I pursue the alignment with bowtie(1)
#!/bin/bash
basename=mature_bowtie
for i in $(ls any/directory/*.fq | xargs -n 1 basename | sort -u);
do
bowtie -v 2 -a --best --strata -x mature_bowtie any/directory/${i} -S any/directory/${i}.sam
done
And then I get disappointed by seeing the following output:
reads processed: 5009934
reads with at least one alignment: 62 (0.00%)
reads that failed to align: 5009872 (100.00%) Reported 213 alignments
What am I doing wrong? Do I have to convert T to U?
Oh thank you so much! This makes total sense…
So how dow I do the conversion and on which file? The mature.fa I guess …
You could use
sed 's/U/T/g' mature.fa > conv.fa
as a quick brute force method. It may replace someu
characters in fasta headers.Just for the record. I did the conversion with
to avoid modification of the header and it worked perfectly.
From:
To:
And the alignment seems to work as well! Thanks GenoMax
ok great I will try this. Maybe not in line with the thread but which aligner would you recommend to align the small-rna seq onto the genome? I just want to do differential miRNA expression between two conditions, nothing fancy. Is it in this case an advantage to align to the genome? I am worried that by aligning to mirBase the normalization will be tricky.
THANKS!
Since miRNA are small you will want to stick with an un-gapped aligner so
bowtie v.1.x
is a fine choice.thanks, but what is the advantage of alignment to the mouse genome compared to alignment to mirbase? Is there any?
miRNA are small and you are likely to get a lot of multi-mapping. So you would need to account for that. If you are looking for novel species of miRNA then you may have to go the genome way.