Small RNA length distribution
2
0
Entering edit mode
20 months ago
RiNG ▴ 10

Hello,

I have small RNA files with 50 bp in length. I am looking for miRNAs (enriched in the sample).

After trimming the 3' adapter, I got a strange length distribution, with 1-2M reads with ~30bp and 6-7M reads with 50 bp. Nothing appears in the lenghts in between.

Any idea why this happens? If I am looking for miRNA, are the 50 bp reads artifacts?

When I try to map this to the genome I get very low alignment with Bowtie2 (12, 14%).

Thanks in advance.

RNA-Seq miRNA • 878 views
ADD COMMENT
0
Entering edit mode

miRNA should be small so I am not sure what you have in terms of 50 bp reads.

Did your library prep include attachment of the 3'-adapter directly to miRNA? If that was the case then basically if your reads do not have that adapter in your reads then those are likely not miRNA.

ADD REPLY
0
Entering edit mode

Show the commands used for trimming and mapping.

miRNAs sequencing should be enriched for ~18-26bp reads, both because this is their typical length when mature, and because (correct) library preparation should preferentially select this range. However, it is common to have a large proportion of piRNA and other small / degraded RNAs in a small RNA library.

ADD REPLY
0
Entering edit mode
20 months ago
wm ▴ 510

You might use a wrong adapter sequence for 3' adapter trimming. The library could be Illumina TruSeq RNAseq, or others.

The most direct way is to check the library preparation protocol or ask the one who did the experiment.

If not, you can guess the adapter anyway.

Have a look at the Summary in the cutadapt log. What are the number Reads with adapters: ... (xx.x%)? (varies depend on your insert size and library quality. it could be 75%, 99% for my data).

Prepare a subsample for testing

# 10k reads
$ head -n 40000 raw.fq > demo.fq

First, you can test the following 2 adapters, using cutadapt https://github.com/marcelm/cutadapt

# TruSeq small RNA library
$ cutadapt -a TGGAATTCTCGGGTG -m 18 -o demo-cut1.fq demo.fq > cut1.log

# TruSeq RNAseq library
$ cutadapt -a AGATCGGAAGAGCAC -m 18 -o demo-cut2.fq demo.fq > cut2.log

Any way, you can guess the adapter using fastp: https://github.com/OpenGene/fastp

$ fastp -i demo.fq -o trimmed.fq 
Detecting adapter sequence for read1...
>Illumina TruSeq Adapter Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
...
ADD COMMENT
0
Entering edit mode
20 months ago
jaqx008 ▴ 110

As pointed above. you must have used the wrong adapters. I recommend that you use fastqc to check the quality of your reads. Overrepresented reads should reveal your adapter sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 2052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6