STAR or Bowtie for small RNA seq?
3
1
Entering edit mode
4.0 years ago
max_19 ▴ 170

Hi all!

I am analyzing some small RNA seq data. To map reads to genome, i'm wondering if STAR or bowtie would be a better fit for my data.

My reads are between 15-30bp in length.

RNA-Seq mapping reads sequencing • 8.6k views
0
Entering edit mode

How did you get reads with 15-30bp long ? Those will be hard to align properly

2
Entering edit mode

This would be quite expected in smallRNA-seq data after trimming sequencing adaptors and filtering low quality reads.

0
Entering edit mode

As I said in my answer below I missed this part reading the post :) Monday morning pleasures

0
Entering edit mode

or NovoAlign? should we also consider it in the comparison, if not, why so?

3
Entering edit mode
4.0 years ago

I don't think Bowtie(2) is splice aware, you'd want STAR since this is RNA-Seq, or Tophat2. HISAT2 is another one.

Personally I really like STAR and it does well in peer reviewed benchmarks. And there's a parameter to share the memory between concurrent processes to align multiple samples at once.

2
Entering edit mode

STAR and HISAT2 are splice aware but becareful with Tophat

1
Entering edit mode

Thank you. Being splice aware is definitely preferred. I can see from the manual that STAR also outputs a SJ.out.tab file which contains splice junctions in tab-delimited format. Does this mean that it is essentially able to identify junction-mapping small RNAs?

0
Entering edit mode

It does not need to be splice-aware. smallRNAs do typically not undergo splicing and one aligns against an existing database like miRbase for microRNA instead of the genome, requiring ungapped alignments tuned for very short reads, which is what bowtie is very good at (and bowtie2 not, because it performs better at longer read lengths).

3
Entering edit mode
4.0 years ago
Emilio Marmol ▴ 170

We get pretty decent alignment rates and accurate results with Bowtie and following specifications:

bowtie -n 1 -l 10 -m 100 -k 1 --best --strata

3
Entering edit mode
4.0 years ago

I missed the fact you were using small RNA-seq data. Your sequences are too short to be analyze with classic RNA-seq tools, see also

Best/right way to quantify small RNA transcripts

But if you want to stick with STAR, here are some advises from Alexander Dobin, one of the STAR authors, to align miRNA

1
Entering edit mode

Thanks so much for that link, very helpful! I ended up giving STAR a go, with the recommended parameter settings in that link. Below is my final log output, i think the reads are mapping pretty well!

                  Number of input reads |    39129818
Average input read length |    22
Uniquely mapped reads number |    31732915
Uniquely mapped reads % |    81.10%
Average mapped length |    21.73
Number of splices: Total |    1388166
Number of splices: Annotated (sjdb) |    1388166
Number of splices: GT/AG |    1380537
Number of splices: GC/AG |    5808
Number of splices: AT/AC |    46
Number of splices: Non-canonical |    1775
Mismatch rate per base, % |    0.20%
Deletion rate per base |    0.00%
Deletion average length |    1.00
Insertion rate per base |    0.00%
Insertion average length |    1.02
Number of reads mapped to multiple loci |    6018428
% of reads mapped to multiple loci |    15.38%
Number of reads mapped to too many loci |    69
% of reads mapped to too many loci |    0.00%
% of reads unmapped: too short |    3.08%
% of reads unmapped: other |    0.44%

0
Entering edit mode

hello, could you comment/show which parameters exactly you used for alignment? I am running extracellular vesicular data & i am having only 13% uniquely mapped, 15% multi mapped, rest are unmapped. I am using STAR as well.

0
Entering edit mode

Hi anara92; you might want to start your own question. There could be a lot of reasons why you're getting low mapping rates and often it doesn't even have to do with the parameters so you will get more precise help (and faster since it'll be asked to the whole Biostars community, not just people in this question.) You might also want to search the forums for "low mapping rate" and "RNA-Seq" or something like that so see if there's some hints from other questions.

For the parameters, if you still want to try them, I believe they're in that link max_19 posted (See the STAR Google Group, Alex Dobin's answer.)