Question

Small RNA length distribution

0

Entering edit mode

4.1 years ago

RiNG ▴ 10

Hello,

I have small RNA files with 50 bp in length. I am looking for miRNAs (enriched in the sample).

After trimming the 3' adapter, I got a strange length distribution, with 1-2M reads with ~30bp and 6-7M reads with 50 bp. Nothing appears in the lenghts in between.

Any idea why this happens? If I am looking for miRNA, are the 50 bp reads artifacts?

When I try to map this to the genome I get very low alignment with Bowtie2 (12, 14%).

Thanks in advance.

RNA-Seq miRNA • 1.9k views

ADD COMMENT • link updated 4.1 years ago by jaqx008 ▴ 110 • written 4.1 years ago by RiNG ▴ 10

0

Entering edit mode

miRNA should be small so I am not sure what you have in terms of 50 bp reads.

Did your library prep include attachment of the 3'-adapter directly to miRNA? If that was the case then basically if your reads do not have that adapter in your reads then those are likely not miRNA.

ADD REPLY • link 4.1 years ago by GenoMax 141k

0

Entering edit mode

Show the commands used for trimming and mapping.

miRNAs sequencing should be enriched for ~18-26bp reads, both because this is their typical length when mature, and because (correct) library preparation should preferentially select this range. However, it is common to have a large proportion of piRNA and other small / degraded RNAs in a small RNA library.

ADD REPLY • link 4.1 years ago by h.mon 35k

score 0 · Answer 1 · 2020-03-17

You might use a wrong adapter sequence for 3' adapter trimming. The library could be Illumina TruSeq RNAseq, or others.

The most direct way is to check the library preparation protocol or ask the one who did the experiment.

If not, you can guess the adapter anyway.

Have a look at the Summary in the cutadapt log. What are the number Reads with adapters: ... (xx.x%)? (varies depend on your insert size and library quality. it could be 75%, 99% for my data).

Prepare a subsample for testing

# 10k reads
$ head -n 40000 raw.fq > demo.fq

First, you can test the following 2 adapters, using cutadapt https://github.com/marcelm/cutadapt

# TruSeq small RNA library
$ cutadapt -a TGGAATTCTCGGGTG -m 18 -o demo-cut1.fq demo.fq > cut1.log

# TruSeq RNAseq library
$ cutadapt -a AGATCGGAAGAGCAC -m 18 -o demo-cut2.fq demo.fq > cut2.log

Any way, you can guess the adapter using fastp: https://github.com/OpenGene/fastp

$ fastp -i demo.fq -o trimmed.fq 
Detecting adapter sequence for read1...
>Illumina TruSeq Adapter Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
...

score 0 · Answer 2 · 2020-03-17

0

Entering edit mode

4.1 years ago

jaqx008 ▴ 110

As pointed above. you must have used the wrong adapters. I recommend that you use fastqc to check the quality of your reads. Overrepresented reads should reveal your adapter sequence.

ADD COMMENT • link 4.1 years ago by jaqx008 ▴ 110