I have paired-end miRNA datasets which I want to analyze and FASTQC results gave good stats to proceed, but because It's the first time I start analyzing this kind of samples, I noticed that it is extremely important in miRNA to remove 3' adapter contamination due to the fact of read small size. So because I mainly focused on other types of analyses which did not include adapter trimming I wanted to ask if the method I'm applying is right enough to make sure I'm removing the adapter.
Here's my adapter sequence:
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG '3
I was planning to use miRdeep2 algorithm which currently has a trimming function; so I took the last 22nt from the adapter and applied the following command:
mapper reads.fastq -e -h -i -j -k CTCGTATGCCGTCTTCTGCTTG -l 18 -m -p genome -s reads.fa -t reads_vs_genome.arf -v -u -n
After this step I checked the mapping results and I was quite surprised because the mapping seems that fail:
mapping reads to genome index #reads processed: 12372857 #reads with at least one reported alignment: 537965 (4.35%) #reads that failed to align: 11809642 (95.45%) #reads with alignments suppressed due to -m: 25250 (0.20%) Reported 660431 alignments to 1 output stream(s) trimming unmapped nts in the 3' ends Mapping statistics #desc total mapped unmapped %mapped %unmapped total: 199690475 109878864 89811611 0.550 0.450 seq: 199690475 109878864 89811611 0.550 0.450
As you can see something went wrong with the mapping ( ¿¿95.5% fail to align?? ) and I don't know what it is...
I was thinking mainly in 2 possible issues:
1.- Adaptor trimming failed 2.- As miRdeep2 does not map paired-end data at once ( authors suggest to treat as single end ), maybe this is influencing in mapping results
I ask if someone experimented same issues and maybe could help clarify this