Small RNA libraries were prepared by using illumina TrueSeq small RNA library and NGS was performed using illumnia Solexa technology. Now i have these fastq files from the plasma having small non coding RNAs. I already did some analysis and have results but want to onfirm what i did is right or not, suggestions are always welcomed. At first I actually want to know the total landscape of all small non coding RNA in the samples (control and cases).
I used trimmomatic to remove the adpators and do quality filetring:
java -jar trimmomatic-0.32.jar PE -threads 24 -phred33 input/20160332_ATCACG_R1.fastq.gz input_20160332_ATCACG_R2.fastq.gz trim_output/20160332_ATCACG_R1P.fastq.gz trim_output/20160332_ATCACG_R1U.fastq.gz trim_output/20160332_ATCACG_R2P.fastq.gz trim_output/20160332_ATCACG_R2U.fastq.gz ILLUMINACLIP:TrueAll.fa:0:30:10:4:true SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:25
Want to know the parameters i used are right? 1) in trimmomatic package, a folder named adapter has teh TruSeq sequences, i concatenated all teh files in the folder (SE, PE, NexteraPE) and made one TrueAll.fa file and used that for adaptor trimming? 2) mismatehes=0 3) minimun length =25.
After trimming the total sequences dropped from 3514467 to 213003, and sequence length changed from 101 to 18-101. Did i loose alot of reads after trimming?
Then, after quality control, i used Tophat2 (default setting) for alignment using UCSC hg19 genome.
Is the approch OK? i read somwhere that GENCODE annotation is much better if we have to work on the small non coding features.
Please, give me suggestions about the right way of analysing my plasma small non coding RNA. Thanks!!!!!!!