Question: Mirna-Seq Trimming And Mapping?
0
gravatar for daniel.soronellas
6.3 years ago by
Spain
daniel.soronellas330 wrote:

Dear community,

I have paired-end miRNA datasets which I want to analyze and FASTQC results gave good stats to proceed, but because It's the first time I start analyzing this kind of samples, I noticed that it is extremely important in miRNA to remove 3' adapter contamination due to the fact of read small size. So because I mainly focused on other types of analyses which did not include adapter trimming I wanted to ask if the method I'm applying is right enough to make sure I'm removing the adapter.

Here's my adapter sequence:

5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG '3

I was planning to use miRdeep2 algorithm which currently has a trimming function; so I took the last 22nt from the adapter and applied the following command:

mapper reads.fastq -e -h -i -j -k CTCGTATGCCGTCTTCTGCTTG -l 18 -m -p genome -s reads.fa -t reads_vs_genome.arf -v -u -n

After this step I checked the mapping results and I was quite surprised because the mapping seems that fail:

mapping reads to genome index
#reads processed: 12372857
#reads with at least one reported alignment: 537965 (4.35%)
#reads that failed to align: 11809642 (95.45%)
#reads with alignments suppressed due to -m: 25250 (0.20%) 
Reported 660431 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends

Mapping statistics

#desc    total    mapped    unmapped    %mapped    %unmapped
total: 199690475    109878864    89811611    0.550    0.450
seq: 199690475    109878864    89811611    0.550    0.450

As you can see something went wrong with the mapping ( ¿¿95.5% fail to align?? ) and I don't know what it is...

I was thinking mainly in 2 possible issues:

1.- Adaptor trimming failed
2.- As miRdeep2 does not map paired-end data at once ( authors suggest to treat as single end ), maybe this is influencing in mapping results

I ask if someone experimented same issues and maybe could help clarify this

Thanks!

trimming mirna mapping adaptor • 5.3k views
ADD COMMENTlink modified 4.2 years ago by wangyizhuss0 • written 6.3 years ago by daniel.soronellas330
3

I have just a short question about your data: How do paired-end reads for microRNAseq look like? I mean... the mature microRNA sequence is about 24nt long. Do the pairs completely overlap each other? Is this protocol strand specific in the end?

ADD REPLYlink written 6.3 years ago by David Langenberger9.1k

No, the pairs do not overlap completely each other and is not strand specific protocol.

ADD REPLYlink written 6.3 years ago by daniel.soronellas330
2

why did you take the last 22? the tail of the adapter is the part you'd least likely see intact. try the whole thing

ADD REPLYlink written 6.3 years ago by Jeremy Leipzig18k

Thanks for the suggestion! I tried and worked fine :)

ADD REPLYlink written 6.3 years ago by daniel.soronellas330

I am hearing paired end seq for miRNA for the first time. If it is a publicly available data that you are analysing, then can you share the link ?

ADD REPLYlink written 6.3 years ago by Bharat Iyengar270

I'm sorry this is data not published yet so you will have to wait till the publication comes out

ADD REPLYlink written 6.3 years ago by daniel.soronellas330
1
gravatar for Fabio Marroni
6.3 years ago by
Fabio Marroni2.3k
Italy
Fabio Marroni2.3k wrote:

I dont' know miRdeep2 but maybe it reports somewhere statistics of adapter removal. Alternatively, you could try some other tool, such as cutadapt (https://code.google.com/p/cutadapt/) for adapter removal. So you can check if the problem lies in adapter trimming.

ADD COMMENTlink written 6.3 years ago by Fabio Marroni2.3k

Thanks for the tool!

ADD REPLYlink written 6.3 years ago by daniel.soronellas330
0
gravatar for manekineko
4.3 years ago by
manekineko130
Bulgaria
manekineko130 wrote:

Hi,

I have also miRNAs library in pair-end which was strange as I always analyzed single end?

I figured out that the R1 set contain "TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC" adapter + barcode + smallRNA adapter and at the beginning is the miRNA seq.

I have a bit confusion for the R2, what type of miRNA is sequnced there - is it the same as in R1 or different miR? is it on reverse-compliment state? And moreover After I clean R1 and R2, what should I do? Should I just combine all clean reads in one file or?

PS

I found that my R2 set is containing an just one adapter - GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT, and before it is the miRNA probably?

I have found only one post in the net seems similar but it does not explain how than you analyse the miRNAs as they are in R1 and R2, how they are related or they are just separate independant miRNAs - http://www.homolog.us/blogs/blog/2012/07/19/analysis-of-ngs-mirna-data-a-paired-ended-library/

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by manekineko130
1

First of all, do not ask your question as an answer at someone else's question (and a very old question at that).

Second, see single- or pair-end small RNA seq for miRNAs? and small RNA-seq pipelines for similar questions, and Brian Bushnell comment C: I need to know how to trimming RNAseq of GEO dataset. as to the better way to clean adapters for such cases.

ADD REPLYlink written 4.3 years ago by h.mon27k

Thanks! Sorry, I'm a new user of Biostars and I thought it is better to ask here rather open a new tread with similar topic.

To the topic - do you think merging PE will be better way for the miRNA analysis, or just use only PE1 (I have the miRNA in PE1 and in PE2 is a reverse-compliment) ? What actually merging is doing?

(PS Is merging step is after the adapter cliping or it is done on raw reads?)

EDIT, I tried PEAR and it output a strange result:

Assembled reads ...................: 33,662 / 4,000,000 (0.842%)
Discarded reads ...................: 0 / 4,000,000 (0.000%)
Not assembled reads ...............: 3,966,338 / 4,000,000 (99.158%)
ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by manekineko130

@manekineko

In ancient DNA, we routinely faced the problem of having DNA fragments that are smaller than the read length and have adapters at the ends and overlapping portions of the forward and reverse. We tried to determine whether only trimming the adapters (using cutadapt) would perform just as well as trimming and merging (using our algorithm leehom which does both in a single step). We saw a reduction of noise when we trimmed and merge due to the cross-correction provided by observing the same base twice. Read our study here:  

http://nar.oxfordjournals.org/content/42/18/e141

We haven't tried it on miRNA but I don't see why our results wouldn't apply to this problem as well. 

ADD REPLYlink written 4.2 years ago by Gabriel R.2.6k

I have the case which you describe in your paper Figure1 a) - the sequence if miRNA is 20-24 which is quite too small, and it is in the beginning of the R1 and in reverse-complement in R2 paired end file. But I think I have some kind of linker to increase the size of the sequence of interest.

I have run the program and it outputs this, can you help my to see what it is and is it normal:

Total 4000000; Merged (trimming) 3779698; Merged (overlap) 0; Kept PE/SR 147618; Trimmed SR 0; Adapter dimers/chimeras 72684; Failed Key 0


          0 Jul 15 12:41 001.merged.fastq.fail.fq.gz
   44616689 Jul 15 12:58 001.merged.fastq.fq.gz
    4505814 Jul 15 12:58 001.merged.fastq_r1.fail.fq.gz
    8765518 Jul 15 12:58 001.merged.fastq_r1.fq.gz
    4726149 Jul 15 12:58 001.merged.fastq_r2.fail.fq.gz
    8723321 Jul 15 12:58 001.merged.fastq_r2.fq.gz
ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by manekineko130

Almost. Did you run with --ancientdna ? Because then it would merge the overlapping parts even in the absence of the adapter (Figure1 case B).

ADD REPLYlink written 4.2 years ago by Gabriel R.2.6k

No I did not used --ancientdna. I have small RNAs so they are 19-24nt so probably overlaping?

I used it - leeHom -fq1 1.fastq -fq2 2.fastq -f TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -s GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT

Does it mean I have now 3779698 single reads megred from perfectly overlapping 3779698 pairs in 001.merged.fastq.fq.gz?

 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by manekineko130

That means that the overwhelming majority 3.7M out of 4M had adapters and they were merged. Still, I would try to use --ancientdna to see the tail of the distribution.

ADD REPLYlink written 4.2 years ago by Gabriel R.2.6k

With --ancientdna I got same, how to know which result to take from both?:

Total 4000000; Merged (trimming) 3779698; Merged (overlap) 0; Kept PE/SR 147618; Trimmed SR 0; Adapter dimers/chimeras 72684; Failed Key 0

ADD REPLYlink written 4.2 years ago by manekineko130
1

ok I was expecting to see something different. Then use the first one from

001.merged.fastq.fq.gz
ADD REPLYlink written 4.2 years ago by Gabriel R.2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1231 users visited in the last hour