STAR or Bowtie for small RNA seq?
3
1
Entering edit mode
5.2 years ago
max_19 ▴ 170

Hi all!

I am analyzing some small RNA seq data. To map reads to genome, i'm wondering if STAR or bowtie would be a better fit for my data.

My reads are between 15-30bp in length.

Many thanks for your suggestions.

RNA-Seq mapping reads sequencing • 11k views
ADD COMMENT
0
Entering edit mode

How did you get reads with 15-30bp long ? Those will be hard to align properly

ADD REPLY
2
Entering edit mode

This would be quite expected in smallRNA-seq data after trimming sequencing adaptors and filtering low quality reads.

ADD REPLY
0
Entering edit mode

As I said in my answer below I missed this part reading the post :) Monday morning pleasures

ADD REPLY
0
Entering edit mode

or NovoAlign? should we also consider it in the comparison, if not, why so?

ADD REPLY
3
Entering edit mode
5.2 years ago

I don't think Bowtie(2) is splice aware, you'd want STAR since this is RNA-Seq, or Tophat2. HISAT2 is another one.

Personally I really like STAR and it does well in peer reviewed benchmarks. And there's a parameter to share the memory between concurrent processes to align multiple samples at once.

ADD COMMENT
2
Entering edit mode

STAR and HISAT2 are splice aware but becareful with Tophat

ADD REPLY
1
Entering edit mode

Thank you. Being splice aware is definitely preferred. I can see from the manual that STAR also outputs a SJ.out.tab file which contains splice junctions in tab-delimited format. Does this mean that it is essentially able to identify junction-mapping small RNAs?

ADD REPLY
0
Entering edit mode

It does not need to be splice-aware. smallRNAs do typically not undergo splicing and one aligns against an existing database like miRbase for microRNA instead of the genome, requiring ungapped alignments tuned for very short reads, which is what bowtie is very good at (and bowtie2 not, because it performs better at longer read lengths).

ADD REPLY
3
Entering edit mode
5.2 years ago
Emilio Marmol ▴ 170

We get pretty decent alignment rates and accurate results with Bowtie and following specifications:

bowtie -n 1 -l 10 -m 100 -k 1 --best --strata
ADD COMMENT
3
Entering edit mode
5.2 years ago

I missed the fact you were using small RNA-seq data. Your sequences are too short to be analyze with classic RNA-seq tools, see also

Best/right way to quantify small RNA transcripts

But if you want to stick with STAR, here are some advises from Alexander Dobin, one of the STAR authors, to align miRNA

ADD COMMENT
1
Entering edit mode

Thanks so much for that link, very helpful! I ended up giving STAR a go, with the recommended parameter settings in that link. Below is my final log output, i think the reads are mapping pretty well!

                  Number of input reads |    39129818
              Average input read length |    22
                            UNIQUE READS:
           Uniquely mapped reads number |    31732915
                Uniquely mapped reads % |    81.10%
                  Average mapped length |    21.73
               Number of splices: Total |    1388166
    Number of splices: Annotated (sjdb) |    1388166
               Number of splices: GT/AG |    1380537
               Number of splices: GC/AG |    5808
               Number of splices: AT/AC |    46
       Number of splices: Non-canonical |    1775
              Mismatch rate per base, % |    0.20%
                 Deletion rate per base |    0.00%
                Deletion average length |    1.00
                Insertion rate per base |    0.00%
               Insertion average length |    1.02
                     MULTI-MAPPING READS:
Number of reads mapped to multiple loci |    6018428
     % of reads mapped to multiple loci |    15.38%
Number of reads mapped to too many loci |    69
     % of reads mapped to too many loci |    0.00%
                          UNMAPPED READS:    % of reads unmapped: too many mismatches |    0.00%
         % of reads unmapped: too short |    3.08%
             % of reads unmapped: other |    0.44%
  
ADD REPLY
0
Entering edit mode

hello, could you comment/show which parameters exactly you used for alignment? I am running extracellular vesicular data & i am having only 13% uniquely mapped, 15% multi mapped, rest are unmapped. I am using STAR as well.

ADD REPLY
0
Entering edit mode

Hi anara92; you might want to start your own question. There could be a lot of reasons why you're getting low mapping rates and often it doesn't even have to do with the parameters so you will get more precise help (and faster since it'll be asked to the whole Biostars community, not just people in this question.) You might also want to search the forums for "low mapping rate" and "RNA-Seq" or something like that so see if there's some hints from other questions.

For the parameters, if you still want to try them, I believe they're in that link max_19 posted (See the STAR Google Group, Alex Dobin's answer.)

ADD REPLY

Login before adding your answer.

Traffic: 1725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6