Question

Mirna-Seq Trimming And Mapping?

0

Entering edit mode

10.8 years ago

daniel.soronellas ▴ 330

Dear community,

I have paired-end miRNA datasets which I want to analyze and FASTQC results gave good stats to proceed, but because It's the first time I start analyzing this kind of samples, I noticed that it is extremely important in miRNA to remove 3' adapter contamination due to the fact of read small size. So because I mainly focused on other types of analyses which did not include adapter trimming I wanted to ask if the method I'm applying is right enough to make sure I'm removing the adapter.

Here's my adapter sequence:

5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG '3

I was planning to use miRdeep2 algorithm which currently has a trimming function; so I took the last 22nt from the adapter and applied the following command:

mapper reads.fastq -e -h -i -j -k CTCGTATGCCGTCTTCTGCTTG -l 18 -m -p genome -s reads.fa -t reads_vs_genome.arf -v -u -n

After this step I checked the mapping results and I was quite surprised because the mapping seems that fail:

mapping reads to genome index
#reads processed: 12372857
#reads with at least one reported alignment: 537965 (4.35%)
#reads that failed to align: 11809642 (95.45%)
#reads with alignments suppressed due to -m: 25250 (0.20%) 
Reported 660431 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends

Mapping statistics

#desc    total    mapped    unmapped    %mapped    %unmapped
total: 199690475    109878864    89811611    0.550    0.450
seq: 199690475    109878864    89811611    0.550    0.450

As you can see something went wrong with the mapping ( ¿¿95.5% fail to align?? ) and I don't know what it is...

I was thinking mainly in 2 possible issues:

1.- Adaptor trimming failed
2.- As miRdeep2 does not map paired-end data at once ( authors suggest to treat as single end ), maybe this is influencing in mapping results

I ask if someone experimented same issues and maybe could help clarify this

Thanks!

mirna trimming adaptor mapping • 8.5k views

ADD COMMENT • link updated 17 months ago by Ram 43k • written 10.8 years ago by daniel.soronellas ▴ 330

3

Entering edit mode

I have just a short question about your data: How do paired-end reads for microRNAseq look like? I mean... the mature microRNA sequence is about 24nt long. Do the pairs completely overlap each other? Is this protocol strand specific in the end?

ADD REPLY • link 10.8 years ago by David Langenberger 11k

0

Entering edit mode

No, the pairs do not overlap completely each other and is not strand specific protocol.

ADD REPLY • link 10.8 years ago by daniel.soronellas ▴ 330

2

Entering edit mode

why did you take the last 22? the tail of the adapter is the part you'd least likely see intact. try the whole thing

ADD REPLY • link 10.8 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Thanks for the suggestion! I tried and worked fine :)

ADD REPLY • link 10.8 years ago by daniel.soronellas ▴ 330

0

Entering edit mode

I am hearing paired end seq for miRNA for the first time. If it is a publicly available data that you are analysing, then can you share the link ?

ADD REPLY • link 10.8 years ago by Bharat Iyengar ▴ 330

0

Entering edit mode

I'm sorry this is data not published yet so you will have to wait till the publication comes out

ADD REPLY • link 10.8 years ago by daniel.soronellas ▴ 330

score 1 · Answer 1 · 2013-07-02

1

Entering edit mode

10.8 years ago

Fabio Marroni ★ 3.0k

I dont' know miRdeep2 but maybe it reports somewhere statistics of adapter removal. Alternatively, you could try some other tool, such as cutadapt (https://code.google.com/p/cutadapt/) for adapter removal. So you can check if the problem lies in adapter trimming.

ADD COMMENT • link 10.8 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Thanks for the tool!

ADD REPLY • link 10.8 years ago by daniel.soronellas ▴ 330

Ram · Answer 2 · 2015-07-14

0

Entering edit mode

8.8 years ago

manekineko ▴ 150

Hi,

I have also miRNAs library in pair-end which was strange as I always analyzed single end?

I figured out that the R1 set contain "TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC" adapter + barcode + smallRNA adapter and at the beginning is the miRNA seq.

I have a bit confusion for the R2, what type of miRNA is sequnced there - is it the same as in R1 or different miR? is it on reverse-compliment state? And moreover After I clean R1 and R2, what should I do? Should I just combine all clean reads in one file or?

PS

I found that my R2 set is containing an just one adapter -GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT, and before it is the miRNA probably?

I have found only one post in the net seems similar but it does not explain how than you analyse the miRNAs as they are in R1 and R2, how they are related or they are just separate independant miRNAs -http://www.homolog.us/blogs/blog/2012/07/19/analysis-of-ngs-mirna-data-a-paired-ended-library/

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by manekineko ▴ 150

1

Entering edit mode

First of all, do not ask your question as an answer at someone else's question (and a very old question at that).

Second, see single- or pair-end small RNA seq for miRNAs? and small RNA-seq pipelines for similar questions, and Brian Bushnell comment C: I need to know how to trimming RNAseq of GEO dataset. as to the better way to clean adapters for such cases.

ADD REPLY • link 8.8 years ago by h.mon 35k

0

Entering edit mode

Thanks! Sorry, I'm a new user of Biostars and I thought it is better to ask here rather open a new tread with similar topic.

To the topic - do you think merging PE will be better way for the miRNA analysis, or just use only PE1 (I have the miRNA in PE1 and in PE2 is a reverse-compliment) ? What actually merging is doing?

(PS Is merging step is after the adapter cliping or it is done on raw reads?)

EDIT, I tried PEAR and it output a strange result:

Assembled reads ...................: 33,662 / 4,000,000 (0.842%)
Discarded reads ...................: 0 / 4,000,000 (0.000%)
Not assembled reads ...............: 3,966,338 / 4,000,000 (99.158%)

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by manekineko ▴ 150

0

Entering edit mode

@manekineko

In ancient DNA, we routinely faced the problem of having DNA fragments that are smaller than the read length and have adapters at the ends and overlapping portions of the forward and reverse. We tried to determine whether only trimming the adapters (using cutadapt) would perform just as well as trimming and merging (using our algorithm leehom which does both in a single step). We saw a reduction of noise when we trimmed and merge due to the cross-correction provided by observing the same base twice. Read our study here: http://nar.oxfordjournals.org/content/42/18/e141

We haven't tried it on miRNA but I don't see why our results wouldn't apply to this problem as well.

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

I have the case which you describe in your paper Figure1 a) - the sequence if miRNA is 20-24 which is quite too small, and it is in the beginning of the R1 and in reverse-complement in R2 paired end file. But I think I have some kind of linker to increase the size of the sequence of interest.

I have run the program and it outputs this, can you help my to see what it is and is it normal:

Total 4000000; Merged (trimming) 3779698; Merged (overlap) 0; Kept PE/SR 147618; Trimmed SR 0; Adapter dimers/chimeras 72684; Failed Key 0


          0 Jul 15 12:41 001.merged.fastq.fail.fq.gz
   44616689 Jul 15 12:58 001.merged.fastq.fq.gz
    4505814 Jul 15 12:58 001.merged.fastq_r1.fail.fq.gz
    8765518 Jul 15 12:58 001.merged.fastq_r1.fq.gz
    4726149 Jul 15 12:58 001.merged.fastq_r2.fail.fq.gz
    8723321 Jul 15 12:58 001.merged.fastq_r2.fq.gz

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by manekineko ▴ 150

0

Entering edit mode

Almost. Did you run with --ancientdna? Because then it would merge the overlapping parts even in the absence of the adapter (Figure1 case B).

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

No I did not used --ancientdna. I have small RNAs so they are 19-24nt so probably overlaping?

I used it -leeHom -fq1 1.fastq -fq2 2.fastq -f TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -s GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT

Does it mean I have now 3779698 single reads merged from perfectly overlapping 3779698 pairs in 001.merged.fastq.fq.gz?

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by manekineko ▴ 150

0

Entering edit mode

That means that the overwhelming majority 3.7M out of 4M had adapters and they were merged. Still, I would try to use --ancientdna to see the tail of the distribution.

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

With --ancientdna I got same, how to know which result to take from both?:

Total 4000000; Merged (trimming) 3779698; Merged (overlap) 0; Kept PE/SR 147618; Trimmed SR 0; Adapter dimers/chimeras 72684; Failed Key 0

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by manekineko ▴ 150

1

Entering edit mode

ok I was expecting to see something different. Then use the first one from

001.merged.fastq.fq.gz

ADD REPLY • link 8.8 years ago by Gabriel R. ★ 2.9k