I want to share my strange experience, to ask your opinion and help.
I'm working on microRNA sequencing of a never-studied plant. I received raw data from our service company, as FASTQ file. 30 million read, 50bp length.
I already did RNAseq analysis and I was quite familiar with several tools such as fastqc, trimmomatic, bowtie2, bowtie etc...
I removed the 3' adapter provided to me by the service company. They used Illumina technology. Quality control confirmed that the adaptor sequence is right.
After adaptor trimming, I have great peaks between 19 and 39 bp (first strange thing for me...) and also some minor peaks between 39 and 50...
I downloaded the "hairpin.fa" file from MirBase, without filtering for a specific organism, changing all U in T and removing items with uncommon chars (Y,X,K etc...).
The alignment rate at this step is really low... about 3%
So I did the alignment again, this time versus the A.thaliana genome. The alignment rate increased to 65% (reads aligned only 1 time about 15%). if I launch htseq-count in order to count alignments in genome regions coding for microrna, I found 0 values for all!
I really tried everything and I don't know how to solve this problem:
- adaptor trimming with: trimmomatic, cutadapt, fastx-clipper, novoalign
- mapping with: bowtie, bowtie2 (using local parameter...), mirdeep2, novoalign
Waiting for your help and suggestions!
Thank you in advance