Question

Trimming of reads in miRNA-Seq data

1

Entering edit mode

11 months ago

Ezhil La ▴ 40

Dear All,

I have been trying to filter out reads from Fastq files from miRNA-Seq that we received. The read structure looks like the one shown in the figure below. I can use Cutadapt to filter out the adapter (we have the adapter sequence) and retain the 15 - 55 sequence using the -m and -M options. Before this filtering step, I want to filter out the common sequence (we know the sequence) and the UMI. I have tried the Seqkit grep option: seqkit grep -rvip ATCTGTAGGCAGGATCAAT s1.fq.gz -o s1.clean.fq.gz, but the cleaned output fastq file almost looks like the input fastq file. It seems I am missing something.

Are there any tools that I can use to remove the common sequence and the UMI before I proceed to trim reads with Cutadapt?

Many thanks

Read structure

miRNA-Seq Trimming • 1.8k views

ADD COMMENT • link updated 11 months ago by Trivas ★ 1.8k • written 11 months ago by Ezhil La ▴ 40

0

Entering edit mode

I think the problem with that command is that you will just remove the common sequence and keep smRNA+UMI+adapter.

I would just use the common sequence as the adapter sequence if you don't care about removing PCR duplicates through the UMIs. Or just add this to your command:

seqkit grep -rvip ATCTGTAGGCAGGATCAAT* s1.fq.gz -o s1.clean.fq.gz

ADD REPLY • link 11 months ago by biofalconch ★ 1.1k

0

Entering edit mode

I get an error message when I tried the seqkit with * : zsh: no matches found: ATCTGTAGGCAGGATCAAT*

ADD REPLY • link 11 months ago by Ezhil La ▴ 40

0

Entering edit mode

Hi, sorry I got mixed with seqkit. seqkit grep cannot do that, it can only give you the full sequence, not a subset. You could use seqkit locate on a fasta file (go from fastq to fasta) to do this operation, but you would loose the information on the quality of the reads. I recommend just using the common sequence as an adapter:

cutadapt  -a ATCTGTAGGCAGGATCAAT -o s1.clean.fq.gz s1.fq.gz

Sorry about the mixup!

ADD REPLY • link 11 months ago by biofalconch ★ 1.1k

0

Entering edit mode

Thanks a lot. I could try this option as well.

ADD REPLY • link 11 months ago by Ezhil La ▴ 40

0

Entering edit mode

Which miRNA-seq library prep kit are you using? I'm a shill for miRge3.0 - thought it was super easy to get your data processed (if you have a well studied model organism) although hard to customize for more complex/downstream applications.

ADD REPLY • link 11 months ago by Trivas ★ 1.8k

0

Entering edit mode

QIAseq miRNA Library Kit.

ADD REPLY • link 11 months ago by Ezhil La ▴ 40

0

Entering edit mode

Good call, from personal experience that's the one that worked the best. I'd still check out the miRge3.0 pipeline, they have a one-liner that works near perfectly for the QIAseq kit.

ADD REPLY • link 11 months ago by Trivas ★ 1.8k

score 2 · Answer 1 · 2023-08-23

2

Entering edit mode

11 months ago

GenoMax 144k

to remove the common sequence and the UMI

You can use bbduk.sh from BBMap suite to do this. Try

bbduk.sh -Xmx2g in=your.fq.gz out=clean.fq.gz literal=ATCTGTAGGCAGGATCAAT ktrim=r k=7

I will suggest that you stay with bbduk.sh and complete whatever you need to do.

ADD COMMENT • link 11 months ago by GenoMax 144k

0

Entering edit mode

Thanks a lot. I tried with

bbduk.sh -Xmx27g in=s1.fastq.gz out=s1.bbduk.fastq.gz literal=ATCTGTAGGCAGGATCAAT ktrim=r k=7 minlen=15

It seems it removed 50.56% of reads from the Input (below is the output from the bbduk). Is this normal?

Input: 26325090 reads 1956237471 bases.
KTrimmed: 26221546 reads (99.61%) 1624914349 bases (83.06%)
Total Removed: 13309234 reads (50.56%) 1624914349 bases (83.06%)
Result: 13015856 reads (49.44%) 331323122 bases (16.94%)

Is there any parameter to filter out reads beyond the length of 55 (maximum length)?

Many thanks

ADD REPLY • link 11 months ago by Ezhil La ▴ 40

1

Entering edit mode

Is there any parameter to filter out reads beyond the length of 55 (maximum length)?

You can add maxlength=55 to the command.

ADD REPLY • link 11 months ago by GenoMax 144k

0

Entering edit mode

Thanks a lot.

ADD REPLY • link 11 months ago by Ezhil La ▴ 40