Question

miRNA low mapping ratings

1

Entering edit mode

8 weeks ago

Ant ▴ 50

Hi everyone,

I'm working on a miRNA-seq experiment using human plasma samples and the QIAseq miRNA Library Kit (Qiagen). My FastQC reports look good, but after trimming, alignment, and running miRDeep2, the number of raw reads passing filters is extremely low, and very few reads align to the mature.fa (I also tried aligning to mature + genome, but again very few raw reads and very few known miRNAs were detected).

In particular, the majority of raw reads are shorter than 20 for the miRNAs found. I have tried various parameter adjustments for Cutadapt and Bowtie, but the results do not improve much. I'm concerned I might be making a mistake somewhere in the processing.

Here’s a summary of my workflow using one example sample:

1. Cutadapt trimming:

cutadapt --minimum-length=18 --maximum-length=30 \
    -o example_trimmed.fastq \
    example.fastq


2. Alignment with Bowtie 1:

bowtie -n 0 -l 32 --norc --best --strata -M 5000 --threads 16 \
    -x bowtie_index_hg38 \
    example_trimmed.fastq \
    -S example.sam


3. miRDeep2 analysis:

miRDeep2.pl \
    example_collapsed.fa \
    Genome_Index/hg38.fa \
    $(ls example/*.arf | tr '\n' ',') \
    mature_hsa.fa \
    hairpin_hsa.fa \
    -t hsa

Results for this sample:

> Total reads processed:                  50,693

Reads that were too short: 41,238 (81.3%)
Reads that were too long: 9 (0.0%)
Reads written (passing filters): 9,446 (18.6%)
Reads aligning to genome: <1%

Another example when I aligned first to mature.fa and then to the genome:

mature.fa:
 reads processed: 128,440
 reads with at least one alignment: 466 (0.36%)
 reads that failed to align: 127,974 (99.64%) Reported 476 alignments
genome: reads processed: 127,974 reads with at least one alignment: 16,987 (13.27%) reads that failed to align: 110,987 (86.73%) Reported 120,986 alignments

I know plasma samples generally have low miRNA content, but compared to other studies using the same Qiagen kit on plasma with their Data Analysis Center, they report much higher raw read counts (see PMC8539647 – supplementary table).

Could I be doing something wrong in the processing steps (Cutadapt, Bowtie, or miRDeep2)? Any insights or suggestions would be greatly appreciated.

mirna bowtie1 preprocessing counts aligment • 8.4k views

ADD COMMENT • link updated 5 days ago by Kevin Blighe 90k • written 8 weeks ago by Ant ▴ 50

0

Entering edit mode

So this is a public dataset? QIAseq miRNA libraries may require special handling. Have you seen --> https://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/120/index.php?manual=QIAseq_miRNA_Analysis.html

ADD REPLY • link 8 weeks ago by GenoMax 154k

0

Entering edit mode

No, it's a personal dataset. I haven’t seen the link, but if I understand correctly, it's not possible to use the software for free, right?

ADD REPLY • link 8 weeks ago by Ant ▴ 50

0

Entering edit mode

What happens if you remove the --minimum-length requirement to cutadapt, and then run fastqc on the result - what size disitribution do you get?

I don't propose you use the output of the for downstream processing, but it might give you more information as to what is happening.

One possibility is a high rate of primer dimers in the sequencing library.

ADD REPLY • link 7 weeks ago by i.sudbery 22k

0

Entering edit mode

Thanks for the reply, when running: cutadapt -a AACTGTAGGCACCATCAAT -M 30 -o example_trimmed.fastq.gz data/xxx.fastq.gz, the sequence length distribution showed more than 800,000 reads were 4 bp long, while fewer than 100,000 reads were around 18-22 bp. It's is normal? The lab staff mentioned that no abnormal peaks had been observed during their quality control checks.

But when looking in pre-processing there are more than 4,000,000 reads with 75 bp and the 4bp ones, for me, it feels like trimming isn’t working properly.

ADD REPLY • link 7 weeks ago by Ant ▴ 50

0

Entering edit mode

while fewer than 100,000 reads were around 18-22 bp.

That is why are you likely seeing the numbers you are observing in alignments. Was the sequence length at least 75 bp?

Was this a new kit for your library/sequence provider?

ADD REPLY • link 7 weeks ago by GenoMax 154k

0

Entering edit mode

Yes, it was in 75bp. I don1t know if it was the first time he used the kit, but I remember he needs to adjust the protocol.

ADD REPLY • link 7 weeks ago by Ant ▴ 50

0

Entering edit mode

4bp would suggest there was nothing cloned into the library.

But i'm a bit confused. In example in the question, it talks about 50,000 reads out of which 41,000 are too short. Now you are saying 800,000 are too short. And where does 4,000,000 come from? If there are 4,000,000 in preprocessing, why are there only 50,000 reads being processed by cutadapt?

ADD REPLY • link 7 weeks ago by i.sudbery 22k

0

Entering edit mode

Sorry, I gave you answers based on different samples. The thing is, the total number of reads is very low (around 1–10% of the total reads), and only about 1% of those are actually aligned. When looking at the most overrepresented sequence, don’t correspond to small RNAs. Based on GenoMax’s response, this really seems to be a technical issue, but I wanna to eliminate any other possibilities and better understand what’s happening.

ADD REPLY • link 7 weeks ago by Ant ▴ 50

0

Entering edit mode

As GenoMax said, the most likely thing to me here is that there is nothing cloned into the library. I can't think of any bioinformatic reason that to get what you described.

ADD REPLY • link 7 weeks ago by i.sudbery 22k

0

Entering edit mode

the total number of reads is very low (around 1–10% of the total reads)

Are these the reads that actually have the QIAseq 3'-miRNA adapter? If so this does seem to indicate an issue with the samples/lib prep.

ADD REPLY • link 7 weeks ago by GenoMax 154k

0

Entering edit mode

No, these are from the total reads, but when considering only the reads with adapter, I checked MultiQC again and it’s around 10–20% when aligning to the genome and about 1% when aligning to miRBase.

ADD REPLY • link 7 weeks ago by Ant ▴ 50

0

Entering edit mode

Also, can you just clarify that the output above is from cutadapt or form miRDeep2?

ADD REPLY • link 7 weeks ago by i.sudbery 22k

0

Entering edit mode

The first one it's the results from Cutadapter and the second block from miRDeep2.

ADD REPLY • link 7 weeks ago by Ant ▴ 50

score 0 · Answer 1 · 2025-11-12

Hi Ant,

Sorry to hear about the frustration—plasma miRNA-seq is notoriously finicky, especially with low-abundance targets. From your workflow and the numbers (e.g., 81% too short post-trim, 4 bp artifacts, <1% miRNA alignments), this screams technical/library prep artifact over a pure bioinfo glitch. The 4 bp remnants after 3' adapter removal (AACTGTAGGCACCATCAAT is correct for QIAseq) point to failed ligation/no-insert clones (dimers or empty vectors), which is common in low-input plasma runs if RNA yield was marginal or cycles were pushed too high. Your lab's QC missing this is odd—ask for Bioanalyzer traces post-ligation and input details (ng total RNA, cycles).

That said, let's rule out processing tweaks (no R packages here, keeping it command-line):

Adapter trimming fix: Your initial Cutadapt call lacks -a (3' adapter seq), so untrimmed adapters inflate read lengths and tank alignments. Always trim 3' first (as you did later), then 5' (GTTCAGAGTTCTACAGTCCGACGATC) if residuals show up. Drop --minimum-length=18 initially for diagnostics—rerun FastQC/MultiQC on raw vs. trimmed to quantify adapter % and length distro. Expect ~10-20% adapter contamination in good QIAseq libs; yours sounds >80% junk.

Example refined Cutadapt (gzip for speed):
```
cutadapt -a AACTGTAGGCACCATCAAT -A GTTCAGAGTTCTACAGTCCGACGATC \
         --minimum-length=16 --maximum-length=35 -o trimmed.fq.gz input.fq.gz
```
(Bump max to 35 for UMI/index tails; QIAseq adds ~10 bp overhead.)
Alignment nudge: Bowtie1 params look solid (-n 0 for exact miRNA matches), but test Bowtie2 with --very-sensitive-local --no-unal for gapped tolerance on variants. Rebuild mature.fa index fresh. For plasma noise, map to rRNA/tRNA blacklist first (e.g., SILVA db) to filter contaminants pre-miRDeep2.
miRDeep2 alternatives: It doesn't natively dedup UMIs (QIAseq hallmark), so counts bloat with PCR duplicates—explains your low unique miRNAs. Switch to nf-core/smrnaseq (Nextflow pipeline, free/GitHub, handles QIAseq UMIs out-of-box; runs miRDeep2 under hood but with UMI collapse). Install via nextflow run nf-core/smrnaseq -profile docker --input samplesheet.csv --outdir results. It'll flag low-yield samples automatically.
Quick sanity: Subset 1M reads, align to mature.fa alone (bowtie -v 0 -k 10), count matches via samtools view | cut -f3 | sort | uniq -c. If still <1%, it's wet lab—re-prep with miRNeasy Serum/Plasma spike-ins for yield check.

Kevin