Hi all!
I am analyzing some small RNA seq data. To map reads to genome, i'm wondering if STAR or bowtie would be a better fit for my data.
My reads are between 15-30bp in length.
Many thanks for your suggestions.
Hi all!
I am analyzing some small RNA seq data. To map reads to genome, i'm wondering if STAR or bowtie would be a better fit for my data.
My reads are between 15-30bp in length.
Many thanks for your suggestions.
I don't think Bowtie(2) is splice aware, you'd want STAR since this is RNA-Seq, or Tophat2. HISAT2 is another one.
Personally I really like STAR and it does well in peer reviewed benchmarks. And there's a parameter to share the memory between concurrent processes to align multiple samples at once.
STAR and HISAT2 are splice aware but becareful with Tophat
Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017
It does not need to be splice-aware. smallRNAs do typically not undergo splicing and one aligns against an existing database like miRbase for microRNA instead of the genome, requiring ungapped alignments tuned for very short reads, which is what bowtie is very good at (and bowtie2 not, because it performs better at longer read lengths).
We get pretty decent alignment rates and accurate results with Bowtie and following specifications:
bowtie -n 1 -l 10 -m 100 -k 1 --best --strata
I missed the fact you were using small RNA-seq data. Your sequences are too short to be analyze with classic RNA-seq tools, see also
Best/right way to quantify small RNA transcripts
But if you want to stick with STAR, here are some advises from Alexander Dobin, one of the STAR authors, to align miRNA
Thanks so much for that link, very helpful! I ended up giving STAR a go, with the recommended parameter settings in that link. Below is my final log output, i think the reads are mapping pretty well!
Number of input reads | 39129818 Average input read length | 22 UNIQUE READS: Uniquely mapped reads number | 31732915 Uniquely mapped reads % | 81.10% Average mapped length | 21.73 Number of splices: Total | 1388166 Number of splices: Annotated (sjdb) | 1388166 Number of splices: GT/AG | 1380537 Number of splices: GC/AG | 5808 Number of splices: AT/AC | 46 Number of splices: Non-canonical | 1775 Mismatch rate per base, % | 0.20% Deletion rate per base | 0.00% Deletion average length | 1.00 Insertion rate per base | 0.00% Insertion average length | 1.02 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 6018428 % of reads mapped to multiple loci | 15.38% Number of reads mapped to too many loci | 69 % of reads mapped to too many loci | 0.00% UNMAPPED READS: % of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 3.08% % of reads unmapped: other | 0.44%
Hi anara92; you might want to start your own question. There could be a lot of reasons why you're getting low mapping rates and often it doesn't even have to do with the parameters so you will get more precise help (and faster since it'll be asked to the whole Biostars community, not just people in this question.) You might also want to search the forums for "low mapping rate" and "RNA-Seq" or something like that so see if there's some hints from other questions.
For the parameters, if you still want to try them, I believe they're in that link max_19 posted (See the STAR Google Group, Alex Dobin's answer.)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How did you get reads with 15-30bp long ? Those will be hard to align properly
This would be quite expected in smallRNA-seq data after trimming sequencing adaptors and filtering low quality reads.
As I said in my answer below I missed this part reading the post :) Monday morning pleasures
or NovoAlign? should we also consider it in the comparison, if not, why so?