So I have the raw data and the kit used to prepare this is the qiagen mirna library prep kit. It has, accoording to my understanding and everything i read online the following structure: [biological seqeuence]-[constant_region]-[umi]-[adapter]. I orignally used this:
For umi ectraction and adapter discard:
umi_tools extract --extract-method=regex --stdin=$read1 --bc-pattern=".+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)" --stdout umiE-${sample_id}.fastq.gz
and in a later step i had the following
fastp -i $umiE_read1 -o "comptrim-${sample_id}.fastq.gz" -A -Q -L --low_complexity_filter
followed by:
trim_galore -q 28 --phred33 --length 16 --basename "$sample_id" --fastqc -o . "$complexity_trimmed_read1"
the percetage of total reads obtained are arounf 58% after everything is said and done, from what ive read this is tyoical for miRNA data since its mostly adapter.But when I align against miRbase(previously extracted hsa) i get low mapping, when checking with seqkit. Help is very greatly appreciated. The umi extraction is based off my understanding of the prep method and from what ive found online. what could be my issue?
What aligner are you using for this? Have you converted the "U's" in miRBase sequence before creating the index.
Im using Bowtie since it is meant for short reads , after much digging I found that is something I didn’t do and need to do but thank you so much for answering , since that just confirms it . I’ll get back with results after I do it in the morning. I believe this should help:
sed '/^[^>]/s/U/T/g'
So I got some alignment but basically nothing . I used Sam tools to check alignment and I see .31%. I changed U to T and then built bowtie-idx. Will touch base if I figure out the issue.
Can you post examples of a few reads that have survived the pre-processing you did.
i posted 5 example ones, again thank you so much for helping me navigate this, currently trying other options, like algining against genome instead of miRbase.
miRNA should be 21-22+ bp. Looks like you have some shorter sequences in there as well.
Hello Thank you both for helping me , so I ended up aligning against the genome and i got the following:
It finally worked you dont understand how good this feels thank you so much.
You will need to see what you get in terms of counts but they should also align against miRbase. You will need to allow the reads to multi-map as suggested by colindaven below. Use
bowtie v.1.x
.I tried to look up the recommended data analysis protocol for this kit but it appears that Qiagen wants you to use some web-based pipeline that I suppose you get a license for when you buy this kit. No details were readily available.
Yeah thats exactly the issue ive been having is that there is hardly anything on this kit because they probably want you to go through their pipe(im doing this as a personal project), i tried the bowtie v1, also allowed v2 with an -m 5000, (m was based off a ucsc pipeline). I received 70.53% aligment which is not as good but I would rather take more stringent approach. Ill let you know how the progress continues. Thank you so much for all the help and depth you go into helping. It is truly and greatly appreciated. I cant wait to see my next steps and the results. Also in regard to the short sequences i set the lower limit to 16.
70% alignments should be fine for miRNA. You can move on to the remainder of analysis.