Question

small RNA analysis pipeline!

1

Entering edit mode

8.1 years ago

fufuyou ▴ 110

Hi, I am working on sorghum small RNA sequencing. I want to analysis the data. I have read the references about how to analysis the small RNA data. But I donot clearly understand anything. First step: Quality Control, I have remove the adopter and low quality reads. Second step, alignment with genome. My question is I should keep the mapped reads and excluding unmapped reads or I should keep unmapped reads and excluding mapped reads for next step analysis. Third step: alignment with MiRNA,rRNA or piRNA. My question is I should use second step results for mapping. And I should keep the mapped reads or excluding unmapped reads. Other question is about MiRNA, rRNA, and piRNA database. I donot know I should use which database is correct for sorghum. Thanks, Fuyou

RNA-Seq • 4.2k views

ADD COMMENT • link updated 7.3 years ago by A. Domingues ★ 2.7k • written 8.1 years ago by fufuyou ▴ 110

score 0 · Answer 1 · 2016-12-20

Hi! I`m doing the same analysis :), For; adapter removal, quality and trimming process you "have" to use Biopieces and prinseq-lite (linux command line based programs). To align reads to your genome you can use Bowtie 2 (linux command line based program) and, you can save both reads (aligned and unaligned) in fastq files, but, the most interesting are; aligned reads, you can also perform HT-seq, it counts how many reads maps against each gene I think this is a very interesting result (using .bam file result from bowtie alignment) and you can visualize mapped reads using IGV-Viewer, after that you can asign function with a simple blastn-short (i think). Actually exist 2 graphic user interface programs to do that, iMir and srna-Workbench but I can´t install them on my computer. If you want send me message and we can help us together. Sorry for my bad English .

score 0 · Answer 2 · 2016-12-20

small RNA is a large world :) so it would be nice to know a little more about the goals of the project.

Regardless, in the past I tested piPipes which is more focused on piRNAs but will give you results for other classes of small RNAs. It takes some time to set-up for species that is not included in their bundle, but it might be worth it specially if you are starting this analysis with little experience. It will output a ton of results, but you can explore those at your leisure.The caveat is that you will need to clean-up the reads before using piPipes, by which I mean remove UMIS (if using), and trimming adapaters. For the trimming I use cutadapt, but other tools are available. I am not suing the tool just because my project required something a little more customized.

Now straight to your questions.

My question is I should keep the mapped reads and excluding unmapped reads or I should keep unmapped reads and excluding mapped reads for next step analysis.

It really depends on how you are assigning reads to features. If you are mapping directly to the genome and then intersecting with a list of features, discard unmapped reads and keep only mapped (reduces the size of the BAM file). If on the other hand you are going with stratified approach of first mapping to say rRNA/tRNA, then map to miRNA, (..), and only then to the genome, in each of the steps both mapped and unmapped reads need to be kept. Say in the first step you will get the mapped reads to rRNA, and use the unmapped for the alignment to miRNA.

Other question is about MiRNA, rRNA, and piRNA database. I donot know I should use which database is correct for sorghum.

I never worked with plants, so I am of little help here. However, mirBASE seems like good place to start for miRNAs. No idea about piRNAs.

Bonus answer

What you should also be thinking about, specially working with piRNAs, is about reads mapping to unique or multiple locations in the genome. There is no right answers for this, but there are some wrong ones. Avoid keeping all or multiple locations for a single read, and if do so please proceed with care in the downstream analysis since the results might be overestimated.