Question

mireap procedure short tutorial

2

Entering edit mode

8.8 years ago

zizigolu ★ 4.4k

sorry,

who know few words about mireap procedure, by googling I could not find even one syntax :(

miRNA mireap • 6.6k views

ADD COMMENT • link updated 6.2 years ago by Barry Digby ★ 1.3k • written 8.8 years ago by zizigolu ★ 4.4k

1

Entering edit mode

8.8 years ago

natasha.sernova ★ 4.0k

Change case to MIREAP, you will be more successful.

There is even post in biostars.org with another choice:

Using Mireap for the Plant miRNA discovery

ADD COMMENT • link 8.8 years ago by natasha.sernova ★ 4.0k

1

Entering edit mode

thank you but nothing yet :(

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.4k

0

Entering edit mode

I've found this site:

http://www.mireap.net/geninfo.php

ADD REPLY • link 8.8 years ago by natasha.sernova ★ 4.0k

1

Entering edit mode

thank you but this is not related at all and is something else

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.4k

1

Entering edit mode

https://sourceforge.net/projects/mireap/

I hope this is what you are looking for.

or this one as a branch of the above:

https://sourceforge.net/projects/mireap/?source=navbar

ADD REPLY • link 8.8 years ago by natasha.sernova ★ 4.0k

score 7 · Accepted Answer · 2019-04-09

Hopefully this post helps people who need help with MIREAP, as the error messages are non existent and as far as I can see from google searches, no forum board goes into this depth. Installing miRDeep2 and MIREAP is a pain in the @$$ but I'm not covering installation in this post.

I used miRDeep2 to generate the "map.txt" file that seems to be the main problem for most users. The genome I have used in this example is AY509253.2, and my files will be named with this prefix.

miRDeep2 uses Bowtie 1 to index the genome:

bowtie-build AY509253.2.fa AY509253.2_index

Next, use the mapper.pl script to map your sequenced reads to the genome:

mapper.pl small_RNA_trimmed.fastq -e -h -l 17 -p AY509253.2_index -m -s AY509253.2_collapsed.fa -t AY509253.2_mapped.arf -v

Since you have gone this far, you should finish the miRDeep2 analysis. It only takes one more line of code:

miRDeep2.pl AY509253.2_collapsed.fa AY509253.2.fa AY509253.2_mapped.arf none none none 2> AY509253.2.log

The files you will need for MIREAP are:

mireap.pl -i <smrna.fa> -m <map.txt> -r <reference.fa> -o <outdir>

AY509253.2.fa '-r'

AY509253.2_collapsed.fa '-i'

AY509253.2_mapped.arf '-m'

The first step in pre processing for MIREAP (AND miRDeep2 analysis)is to remove whitespaces from the header of the reference genome. I did this in a text editor, if you are working with Viral genomes the chances are it is one large contig and wont have multiple headers in the reference genome file.

The next step is to separate the sequence ID and the read count in the headers of AY509253.2_collapsed.fa. Pre processing, they should look like this, where read counts directly follow '_x', with no whitespace.

>seq_0_x14029989
TCGGTGGGACTTTCGTTCGATT
>seq_14029989_x9481962
AACCCGTAGATCCGAACTTGTG
>seq_23511951_x8078382
TGACTAGATCCACACTCATCC

Use this following code to separate the two:

sed -i 's/x/x /g' AY509253.2_collapsed.fa

and inspect file after:

head AY509253.2_collapsed.fa
>seq_0_x 14029989
TCGGTGGGACTTTCGTTCGATT
>seq_14029989_x 9481962
AACCCGTAGATCCGAACTTGTG
>seq_23511951_x 8078382
TGACTAGATCCACACTCATCC

Finally, to supply the map.txt file for MIREAP, extract the read_ID, chr_ID, start, end, strand(+/-) fields from AY509253.2_mapped.arf:

awk '{print $1'\t'$6'\t'$8'\t'$9'\t'$11}' AY509253.2_mapped.arf > AY509253.2_map_tmp.txt


<edit> I recently ran this code and it did not preserve the tab delimiter.
The below command will generate the example output below. <edit>

awk 'BEGIN {OFS="\t"} {print $1,$6,$8,$9,$11}' AY509253.2_mapped.arf > AY509253.2_map_tmp.txt

Once again, the first column of AY509253.2_map_tmp.txt will have the sequence ID and the read count joined together:

 seq_244718376_x52  AY509253.2  161202  161218  +
 seq_244958036_x50  AY509253.2  136834  136857  -
 seq_245416228_x47  AY509253.2  128123  128144  +

run the following script to generate a properly formatted 'map.txt' file:

#!/bin/bash

sed 's/x.*/x/' AY509253.2_map_tmp.txt > col1.txt

awk 'FNR==NR{a[NR]=$1;next}{$1=a[FNR]}1' col1.txt AY509253.2_map_tmp.txt > mireap_map.txt

rm AY509253.2_map_tmp.txt

tr ' ' '\t' < mireap_map.txt > AY509253.2_map.txt

rm mireap_map.txt

rm col1.txt

Inspect MIREAP mapping file:

seq_244718376_x AY509253.2  161202  161218  +
seq_244958036_x AY509253.2  136834  136857  -
seq_245416228_x AY509253.2  128123  128144  +

Now run MIREAP :)

mireap.pl -i AY509253.2_collapsed.fa -m AY509253.2_map.txt -r AY509253.2.fa -A 17 -t AY509253.2 -o ./AY509253.2

Any suggestions on how to improve the bash script would be appreciated.

Barry

score 2 · Accepted Answer · 2016-09-23

2

Entering edit mode

8.8 years ago

Farbod ★ 3.4k

Dear Angel, Hi

If you want some helps about script and syntax for running this software it is in the README file of the program and is as below, hope that helps:

Program: MIREAP (Reap miRNAs from deeply sequenced smRNA library) Version: 0.2 Contact: Li Qibin liqb@genomics.org.cn Bioinformatics department, Beijing Genomics Institute

1. Introduction MIREAP combines small RNA position and depth with a model of microRNA biogenesis to discover microRNAs from deeply sequenced small RNA library.

2. Installation You must have Vienna RNA Package (http://www.tbi.univie.ac.at/RNA) installed on your computer and make sure that its perl interface is accessible.

Copy mireap_0.2.tar.gz to a directory (/foo/bar) and unpack it by command: tar -zxvf mireap_0.2.tar.gz

Before running mireap, you need add path /foo/bar/mireap_0.1/lib to environment variable PERL5LIB: For csh/tcsh: setenv PERL5LIB /foo/bar/mireap_0.2/lib For sh/ksh/bash: export PERL5LIB=/foo/bar/mireap_0.2/lib

3. Usage

mireap.pl -i <smrna.fa> -m <map.txt> -r <reference.fa> -o <outdir>

Options:

-i <file> Small RNA library, fasta format, forced

-m <file> Mapping file, tabular format, forced

-r <file> Reference file, fasta format, forced

-o <dir> Directory where results produce (current directory)

-t <str> Sample label (xxx)

-A <int> Minimal miRNA sequence length (18)

-B <int> Maximal miRNA sequence length (26)

-a <int> Minimal miRNA reference sequence length (20)

-b <int> Maximal miRNA reference sequence length (24)

-u <int> Maximal copy number of miRNAs on reference (20)

-e <folat> Maximal free energy allowed for a miRNA precursor (-18)

-d <int> Maximal space between miRNA and miRNA* (35)

-p <int> Minimal base pairs of miRNA and miRNA*

-v <int> Maximal bulge of miRNA and miRNA* (4)

-s <int> Maximal asymmetry of miRNA/miRNA* duplex

-f <int> Flank sequence length of miRNA precursor (10)

-h Help

Please convert your small RNA file into fasta format and append sequencing frequence to sequence Id, just like this entry:

t0000035 3234 GAATGGATAAGGATTAGCGATGATACA (t0000035 is read_ID, 3234 is sequencing frequence)

The format of small RNA mapping file should be (delimited by tab or space): read_ID,chr_ID,start,end,strand(+/-)

You can make MIREAP run on the test data by execute comand: perl ../bin/mireap.pl -i rna.fa -m map.txt -r ref.fa .

.

4. Output format MIREAP produce three files at each run.

*.gff This file contains miRNA genes discovered by MIREAP, GFF3 format. For GFF3 format, please refer to http://www.sequenceontology.org/gff3.shtml Attribute 'Count' denotes the sequenceing frequence.

*.aln This file contains sequence and structure of the pre-miRNA. Small RNAs also are aligned to the precursor from which you can get more insights into the maturation process of miRNAs.

*.log This log file records parameters, start end time and other informations.

ADD COMMENT • link 8.8 years ago by Farbod ★ 3.4k

1

Entering edit mode

thank you so much, sorry do you know how to provide map.txt file???

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.4k

1

Entering edit mode

Dear Angel, Hi.

There is already a map.txt file exists in the "mireap_0.2.tar.gz" file and after extraction you can find it. Do you mean that file ?

This is the head of that (it has about 110951 rows):

t0000035 nscaf1690 4798998 4799024 +

t0000035 nscaf1690 4805385 4805411 +

t0000035 nscaf1690 7588502 7588528 +

t0000072 nscaf1690 2923961 2923988 -

t0000093 nscaf1690 784585 784612 +

t0000093 nscaf1690 1539278 1539305 +

t0000093 nscaf1690 2223484 2223511 +

t0000093 nscaf1690 5848415 5848442 +

t0000093 nscaf1690 7501339 7501366 +

t0000093 nscaf1690 2400901 2400928 -

t0000093 nscaf1690 3005327 3005354 -

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

1

Entering edit mode

merc Farbod jan,

I mean how to create my own map file :(

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.4k

1

Entering edit mode

Sorry because of that. As I usually work with miRDeep2 and miRDeep2Star I am not familiar with Mireap software (what is your species of interest? is there any reference genome available for it?).

It seems that in this paper they have used Mireap program, maybe you can email them.

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

1

Entering edit mode

sorry may I have youe email please because I am also working with mirdeep2 and fully stuck on :( :( :(

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.4k

0

Entering edit mode

I don't think Biostars allows requests for personal email addresses. Since there is no facility to private message a user you have no option but to keep all dialog public.

ADD REPLY • link 8.8 years ago by GenoMax 152k

1

Entering edit mode

thank you, you all right. he is an Iranian alike me then I thought I could help because we are closer

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.4k

1

Entering edit mode

I have also heard about a program called miRNAkey that is very user friendly with a GUI and automatic DEmiRNA analysis. If you are not very strict of the algorithms that the software use, you can try it , too.

ADD REPLY • link 8.8 years ago by Farbod ★ 3.4k

score 0 · Accepted Answer · 2016-09-23

0

Entering edit mode

8.8 years ago

zizigolu ★ 4.4k

https://github.com/liqb/mireap

ADD COMMENT • link 8.8 years ago by zizigolu ★ 4.4k