miRNA low mapping ratings
0
1
Entering edit mode
2 days ago
Ant ▴ 50

Hi everyone,

I'm working on a miRNA-seq experiment using human plasma samples and the QIAseq miRNA Library Kit (Qiagen). My FastQC reports look good, but after trimming, alignment, and running miRDeep2, the number of raw reads passing filters is extremely low, and very few reads align to the mature.fa (I also tried aligning to mature + genome, but again very few raw reads and very few known miRNAs were detected).

In particular, the majority of raw reads are shorter than 20 for the miRNAs found. I have tried various parameter adjustments for Cutadapt and Bowtie, but the results do not improve much. I'm concerned I might be making a mistake somewhere in the processing.

Here’s a summary of my workflow using one example sample:

1. Cutadapt trimming:

cutadapt --minimum-length=18 --maximum-length=30 \
    -o example_trimmed.fastq \
    example.fastq


2. Alignment with Bowtie 1:

bowtie -n 0 -l 32 --norc --best --strata -M 5000 --threads 16 \
    -x bowtie_index_hg38 \
    example_trimmed.fastq \
    -S example.sam


3. miRDeep2 analysis:

miRDeep2.pl \
    example_collapsed.fa \
    Genome_Index/hg38.fa \
    $(ls example/*.arf | tr '\n' ',') \
    mature_hsa.fa \
    hairpin_hsa.fa \
    -t hsa

Results for this sample:

> Total reads processed:                  50,693    

Reads that were too short: 41,238 (81.3%)
Reads that were too long: 9 (0.0%)
Reads written (passing filters): 9,446 (18.6%)
Reads aligning to genome: <1%

Another example when I aligned first to mature.fa and then to the genome:

mature.fa:
 reads processed: 128,440
 reads with at least one alignment: 466 (0.36%)
 reads that failed to align: 127,974 (99.64%) Reported 476 alignments


genome: reads processed: 127,974 reads with at least one alignment: 16,987 (13.27%) reads that failed to align: 110,987 (86.73%) Reported 120,986 alignments

I know plasma samples generally have low miRNA content, but compared to other studies using the same Qiagen kit on plasma with their Data Analysis Center, they report much higher raw read counts (see PMC8539647 – supplementary table).

Could I be doing something wrong in the processing steps (Cutadapt, Bowtie, or miRDeep2)? Any insights or suggestions would be greatly appreciated.

mirna bowtie1 preprocessing counts aligment • 3.6k views
ADD COMMENT
0
Entering edit mode

So this is a public dataset? QIAseq miRNA libraries may require special handling. Have you seen --> https://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/120/index.php?manual=QIAseq_miRNA_Analysis.html

ADD REPLY
0
Entering edit mode

No, it's a personal dataset. I haven’t seen the link, but if I understand correctly, it's not possible to use the software for free, right?

ADD REPLY
0
Entering edit mode

What happens if you remove the --minimum-length requirement to cutadapt, and then run fastqc on the result - what size disitribution do you get?

I don't propose you use the output of the for downstream processing, but it might give you more information as to what is happening.

One possibility is a high rate of primer dimers in the sequencing library.

ADD REPLY
0
Entering edit mode

Thanks for the reply, when running: cutadapt -a AACTGTAGGCACCATCAAT -M 30 -o example_trimmed.fastq.gz data/xxx.fastq.gz, the sequence length distribution showed more than 800,000 reads were 4 bp long, while fewer than 100,000 reads were around 18-22 bp. It's is normal? The lab staff mentioned that no abnormal peaks had been observed during their quality control checks.

But when looking in pre-processing there are more than 4,000,000 reads with 75 bp and the 4bp ones, for me, it feels like trimming isn’t working properly.

ADD REPLY
0
Entering edit mode

while fewer than 100,000 reads were around 18-22 bp.

That is why are you likely seeing the numbers you are observing in alignments. Was the sequence length at least 75 bp?

Was this a new kit for your library/sequence provider?

ADD REPLY
0
Entering edit mode

Yes, it was in 75bp. I don1t know if it was the first time he used the kit, but I remember he needs to adjust the protocol.

ADD REPLY
0
Entering edit mode

4bp would suggest there was nothing cloned into the library.

But i'm a bit confused. In example in the question, it talks about 50,000 reads out of which 41,000 are too short. Now you are saying 800,000 are too short. And where does 4,000,000 come from? If there are 4,000,000 in preprocessing, why are there only 50,000 reads being processed by cutadapt?

ADD REPLY
0
Entering edit mode

Sorry, I gave you answers based on different samples. The thing is, the total number of reads is very low (around 1–10% of the total reads), and only about 1% of those are actually aligned. When looking at the most overrepresented sequence, don’t correspond to small RNAs. Based on GenoMax’s response, this really seems to be a technical issue, but I wanna to eliminate any other possibilities and better understand what’s happening.

ADD REPLY
0
Entering edit mode

As GenoMax said, the most likely thing to me here is that there is nothing cloned into the library. I can't think of any bioinformatic reason that to get what you described.

ADD REPLY
0
Entering edit mode

the total number of reads is very low (around 1–10% of the total reads)

Are these the reads that actually have the QIAseq 3'-miRNA adapter? If so this does seem to indicate an issue with the samples/lib prep.

ADD REPLY
0
Entering edit mode

No, these are from the total reads, but when considering only the reads with adapter, I checked MultiQC again and it’s around 10–20% when aligning to the genome and about 1% when aligning to miRBase.

ADD REPLY
0
Entering edit mode

Also, can you just clarify that the output above is from cutadapt or form miRDeep2?

ADD REPLY
0
Entering edit mode

The first one it's the results from Cutadapter and the second block from miRDeep2.

ADD REPLY

Login before adding your answer.

Traffic: 3412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6