Sequence retrieval from miRBase using miRBase sequence ids
1
0
Entering edit mode
6.6 years ago

Hi everyone,

I have got a set of miRBase ids , I want to get its sequence from mirbase . I have downloaded mature microRNA sequences from miRBase my mirBase ids :

miR1171
miR1436
miR1439
miR1446
miR1446a
miR156
miR156a
miR-34-5p
miR156a-3p
miR156a-5p
miR156ad
miR156b
miR156b-3p
miR156c
miR156c-3p
miR156d-3p
miR156e
miR156e-5p
miR156f
miR-1-5p
miR156f-3p
miR156f-5p
miR156g
miR156g-5p
miR156h
miR-35-3p
miR156i
miR156i-3p
miR156j

mature microRNA sequence from miRBase :

>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA
>cel-miR-1-3p MIMAT0000003 Caenorhabditis elegans miR-1-3p
UGGAAUGUAAAGAAGUAUGUA
>cel-miR-2-5p MIMAT0020302 Caenorhabditis elegans miR-2-5p
CAUCAAAGCGGUGGUUGAUGUG
>cel-miR-2-3p MIMAT0000004 Caenorhabditis elegans miR-2-3p
UAUCACAGCCAGCUUUGAUGUGC
>cel-miR-34-5p MIMAT0000005 Caenorhabditis elegans miR-34-5p
AGGCAGUGUGGUUAGCUGGUUG
>cel-miR-34-3p MIMAT0015093 Caenorhabditis elegans miR-34-3p
ACGGCUACCUUCACUGCCACCC
>cel-miR-35-5p MIMAT0020303 Caenorhabditis elegans miR-35-5p
UGCUGGUUUCUUCCACAGUGGUA
>cel-miR-35-3p MIMAT0000006 Caenorhabditis elegans miR-35-3p
UCACCGGGUGGAAACUAGCAGU

I want to get the result like:

>miR-34-5p
ACGGCUACCUUCACUGCCACCC
>miR-1-5p
CAUACUUCCUUACAUGCCCAUA
 miR-35-3p
UCACCGGGUGGAAACUAGCAGU

Please do help me on this regard

Thank you

miRNA RNA-Seq awk grep perl • 2.9k views
ADD COMMENT
1
Entering edit mode

download the data from mirbase ftp and then write a fast python script such as:

my_list = [miR1171, miR1436, miR1439, miR1446, miR1446a...] 
seq = ''

with open(sys.argv[1], 'r') as fasta:
    for line in fasta:
        line = line.rstrip('\n')
        if line.startswith('>'):
            if seq:
                if name in my_list:
                    print name + seq + '\n' 
                seq = ''
            name = line
        else:
            seq = seq + line
    if name in my_list:
        print name + seq + '\n'

It is almost done, complete it :).

ADD REPLY
1
Entering edit mode

Or with shell's grep

grep -A 1 -f IDs.txt miRBase.fa
ADD REPLY
0
Entering edit mode

with this one liner .. I am able to fetch only one sequence as result I got this as output :

jinu@server:~$ grep -A 1 -f id.txt mature.fa

tae-miR9774 MIMAT0036985 Triticum aestivum miR9774 CAAGAUAUUGGGUAUUUCUGUC jinu@server:~$

ADD REPLY
0
Entering edit mode

I am not really good with programming .. even though i tried but it ended with an errorr.. also I have 272 entries in my id list.. This is the result obtained :

jinu@server:~$ python 1.py mature.fa > 1_res File "1.py", line 1 my_list = [miR1171, miR1436, miR1439, miR1446, miR1446a, miR156 , miR156a, miR156a-3p, miR156a-5p, miR156ad, miR156b, miR156b-3p, miR156c, miR156c-3p, miR156d-3p, miR156e, miR156e-5p, miR156f, miR156f-3p, miR156f-5p, miR156g, miR156g-5p, miR156h, miR156i, miR156i-3p, miR156j, miR156j-3p, miR156k, miR156k-3p, miR156k-5p, miR156l, miR156l-5p, miR156q, miR156r, miR157a-5p, miR157b-3p, miR157d, miR157d-3p, miR159 , miR159a, miR159a-3p, miR159a-5p, miR159a.1 , miR159b, miR159b-3p, miR159c, miR159d, miR159e, miR159e-5p, miR159f, miR159h-3p, miR160 , miR160a-3p, miR160a-5p, miR160b, miR160c, miR160d, miR160e-5p, miR160g, miR160h, miR162 , miR162-3p , miR162a-3p, miR164 , miR164a, miR164b, miR164c, miR164c-5p, miR164d, miR164e, miR164e-5p, miR164g-3p, miR164h-5p, miR165a-3p, miR166 , miR166a, miR166a-3p, miR166b, miR166b-5p, miR166c-5p, miR166d-5p, miR166e, miR166e-3p, miR166e-5p, miR166g-3p, miR166g-5p, miR166h-3p, miR166h-5p, miR166i, miR166i-3p, miR166j, miR166j-3p, miR166k, miR166m, miR166n, miR166p, miR166u, miR167 , miR167a, miR167a-5p, miR167b, miR167b-3p, miR167b-5p, miR167c, miR167c-3p, miR167c-5p, miR167d, miR167d-3p, miR167d-5p, miR167f-3p, miR167f-5p, miR167h, miR167h-5p, miR167k, miR168 , miR168a, miR168a-3p, miR168a-5p, miR168b, miR169 , miR169a, miR169a-5p, miR169b-5p, miR169c, miR169d, miR169d-5p, miR169e, miR169e-5p, miR169f, miR169f.1 , miR169g, miR169h, miR169k, miR169l, miR169m, miR169o, miR169s, miR169u, miR169v, miR170-3p , miR171a, miR171a-3p, miR171b, miR171b-3p, miR171c, miR171c-3p, miR171d, miR171e, miR171f, miR171f-3p, miR171g, miR171h, miR171i, miR171k, miR172a, miR172a-3p, miR172a-5p, miR172b, miR172b-5p, miR172c, miR172c-3p, miR172c-5p, miR172d, miR172d-5p, miR172e-3p, miR172f, miR172g-3p, miR172j, miR172k, miR2118, miR319 , miR319a, miR319a-3p, miR319b, miR319c, miR319c-5p, miR319e, miR319g, miR319i, miR319p, miR319q, miR3630-3p, miR3711, miR390 , miR390.1, miR390a-3p, miR390a-5p, miR390b-3p, miR390c-5p, miR390d-3p, miR390e, miR393 , miR393-5p , miR393a, miR393a-3p, miR393a-5p, miR393b-5p, miR393h, miR394a, miR395a, miR395a-3p, miR395b, miR395c-3p, miR395d, miR395g, miR395h, miR395k, miR395t, miR396 , miR396-3p , miR396a, miR396a-3p, miR396a-5p, miR396b, miR396b-3p, miR396b-5p, miR396c, miR396c-3p, miR396d, miR396e, miR396f, miR396g-3p, miR396g-5p, miR396h, miR397 , miR397-5p , miR397a, miR397b, miR398 , miR398a-3p, miR398a-5p, miR398b, miR398b-3p, miR398f, miR399 , miR399a, miR399b, miR399c, miR399d, miR399e, miR399f, miR399g, miR399g-3p, miR399g-5p, miR399i, miR399j, miR403-3p , miR403-5p , miR403a, miR408 , miR408-3p , miR408-5p , miR408a, miR408a-3p, miR408b, miR408d, miR408e, miR473 , miR477a, miR477a-5p, miR477b, miR477d, miR477e, miR477h, miR477i, miR479 , miR482b-3p, miR482c, miR5072, miR5141, miR5174d-3p, miR530 , miR530-5p , miR530a, miR5368, miR5532, miR5538, miR6167, miR6173, miR6478, miR8175, miR827, miR827-3p, miR827a, miR838-3p, miR845, miR858, miR858a, miR858b, miR894, miR9773, miR9774] ^ SyntaxError: invalid syntax jinu@server:~$

ADD REPLY
3
Entering edit mode
6.6 years ago

command:

$ grep --no-group-separator -A 1 -f ids.txt sequences.txt | sed '/^>/ s/\(>\).*\(miR.*\)$/\1\2/'

output:

>miR-1-5p
CAUACUUCCUUACAUGCCCAUA
>miR-34-5p
AGGCAGUGUGGUUAGCUGGUUG
>miR-35-3p
UCACCGGGUGGAAACUAGCAGU

Input (copy/pasted from OP):

$ cat ids.txt 
miR1171
miR1436
miR1439
miR1446
miR1446a
miR156
miR156a
miR-34-5p
miR156a-3p
miR156a-5p
miR156ad
miR156b
miR156b-3p
miR156c
miR156c-3p
miR156d-3p
miR156e
miR156e-5p
miR156f
miR-1-5p
miR156f-3p
miR156f-5p
miR156g
miR156g-5p
miR156h
miR-35-3p
miR156i
miR156i-3p
miR156

and

$ cat sequences.txt 
>cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
UGAGGUAGUAGGUUGUAUAGUU
>cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
CUAUGCAAUUUUCUACCUUACC
>cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
UCCCUGAGACCUCAAGUGUGA
>cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
ACACCUGGGCUCUCCGGGUACC
>cel-miR-1-5p MIMAT0020301 Caenorhabditis elegans miR-1-5p
CAUACUUCCUUACAUGCCCAUA
>cel-miR-1-3p MIMAT0000003 Caenorhabditis elegans miR-1-3p
UGGAAUGUAAAGAAGUAUGUA
>cel-miR-2-5p MIMAT0020302 Caenorhabditis elegans miR-2-5p
CAUCAAAGCGGUGGUUGAUGUG
>cel-miR-2-3p MIMAT0000004 Caenorhabditis elegans miR-2-3p
UAUCACAGCCAGCUUUGAUGUGC
>cel-miR-34-5p MIMAT0000005 Caenorhabditis elegans miR-34-5p
AGGCAGUGUGGUUAGCUGGUUG
>cel-miR-34-3p MIMAT0015093 Caenorhabditis elegans miR-34-3p
ACGGCUACCUUCACUGCCACCC
>cel-miR-35-5p MIMAT0020303 Caenorhabditis elegans miR-35-5p
UGCUGGUUUCUUCCACAGUGGUA
>cel-miR-35-3p MIMAT0000006 Caenorhabditis elegans miR-35-3p
UCACCGGGUGGAAACUAGCAGU
ADD COMMENT
0
Entering edit mode

Thank you so much for your prompt response .. But still i am able to fetch only one sequence sequence as result.. I don't know whats wrong :(

Result obtained :

jinu@server:~$ grep --no-group-separator -A 1 -f id.txt mature.txt | sed '/^>/ s/(>).(miR.)$/\1\2/'

miR9774 CAAGAUAUUGGGUAUUUCUGUC

ADD REPLY
0
Entering edit mode

could you please host part of your data (i.e to know original format of your file- 10 lines should suffice for this) elsewhere and share a link with the forum?

ADD REPLY
0
Entering edit mode

jinu@server:~$ cat id.txt miR1171 miR1436 miR1439 miR1446 miR1446a miR156 miR156a miR156a-3p miR156a-5p miR156ad miR156b miR156b-3p miR156c miR156c-3p miR156d-3p miR156e miR156e-5p miR156f miR156f-3p miR156f-5p miR156g miR156g-5p miR156h miR156i miR156i-3p miR156j miR156j-3p miR156k miR156k-3p miR156k-5p miR156l miR156l-5p miR156q miR156r miR157a-5p miR157b-3p miR157d miR157d-3p miR159 miR159a miR159a-3p miR159a-5p miR159a.1 miR159b

cat mature.txt

ame-miR-9882 MIMAT0037286 Apis mellifera miR-9882 AGCGAUGAGACUAGAUCUUGGC ame-miR-9883 MIMAT0037287 Apis mellifera miR-9883 UUCGGGCGGGCUCGGGCGAGA ame-miR-9884 MIMAT0037288 Apis mellifera miR-9884 UCGGUCGGUGACGAAGCUCCC ame-miR-9885 MIMAT0037289 Apis mellifera miR-9885 UCGGCAAUGAUCGGACGUGGUC ame-miR-9886 MIMAT0037290 Apis mellifera miR-9886 UAGGCGUCACGUUGUGGAACG ame-miR-9887 MIMAT0037291 Apis mellifera miR-9887 AAGGGCUGGGAAGGGCGGAG ame-miR-2b MIMAT0037292 Apis mellifera miR-2b UCAUCAAAGCUGGCUGUGAUAUGA ame-miR-9888 MIMAT0037293 Apis mellifera miR-9888 UGGUGGUCAAGCAAGUAGAACGUU ame-miR-9889 MIMAT0037294 Apis mellifera miR-9889 AGUGUCGAGCCGAAGAAACGCGC ame-miR-9890 MIMAT0037295 Apis mellifera miR-9890 UUCGGAAGAAUGUAGAGAAAAAG ame-miR-9891 MIMAT0037296 Apis mellifera miR-9891 UCGGCUUCGUCCUCGUCGUCG ame-miR-9892 MIMAT0037297 Apis mellifera miR-9892 UGACGCGAUUGUGGAAAUCG ame-miR-9893 MIMAT0037298 Apis mellifera miR-9893 UUAUGAUCUGGAAUACUAGG ame-miR-9894 MIMAT0037299 Apis mellifera miR-9894 GAGGGCGAGGAGAGGAGGAA ame-miR-9895 MIMAT0037300 Apis mellifera miR-9895 UCGUGUCCGUUUCUCGUUUCGA ame-miR-9896 MIMAT0037301 Apis mellifera miR-9896 ACAAUAAUCGGACACAAUCGGC ame-miR-3478 MIMAT0037302 Apis mellifera miR-3478 CACACCGGACGAGAUUUCAU cre-miR9897-5p MIMAT0037303 Chlamydomonas reinhardtii miR9897-5p UACCGGGCGUGGGGAGGGCAGG cre-miR9897-3p MIMAT0037304 Chlamydomonas reinhardtii miR9897-3p UUACGGCUCCUUCUUAUCGGC

ADD REPLY
1
Entering edit mode

Please post the original format or format OP proper

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

enter link description here

result of cat mature.txt

ADD REPLY
0
Entering edit mode

result of cat id.txt enter link description here

ADD REPLY
0
Entering edit mode

not sure, why it is not working on your machine. With the output you posted, i reconstructed the ids and sequences:

$ cat ids.txt 
miR-9887
miR9897-5p

$ cat test.txt 
>ame-miR-9882 MIMAT0037286 Apis mellifera miR-9882 
AGCGAUGAGACUAGAUCUUGGC 
>ame-miR-9883 MIMAT0037287 Apis mellifera miR-9883 
UUCGGGCGGGCUCGGGCGAGA 
>ame-miR-9884 MIMAT0037288 Apis mellifera miR-9884 
UCGGUCGGUGACGAAGCUCCC 
>ame-miR-9885 MIMAT0037289 Apis mellifera miR-9885 
UCGGCAAUGAUCGGACGUGGUC 
>ame-miR-9886 MIMAT0037290 Apis mellifera miR-9886 
UAGGCGUCACGUUGUGGAACG 
>ame-miR-9887 MIMAT0037291 Apis mellifera miR-9887 
AAGGGCUGGGAAGGGCGGAG 
>ame-miR-2b MIMAT0037292 Apis mellifera miR-2b 
UCAUCAAAGCUGGCUGUGAUAUGA 
>ame-miR-9888 MIMAT0037293 Apis mellifera miR-9888 
UGGUGGUCAAGCAAGUAGAACGUU 
>ame-miR-9889 MIMAT0037294 Apis mellifera miR-9889 
AGUGUCGAGCCGAAGAAACGCGC 
>ame-miR-9890 MIMAT0037295 Apis mellifera miR-9890 
UUCGGAAGAAUGUAGAGAAAAAG 
>ame-miR-9891 MIMAT0037296 Apis mellifera miR-9891 
UCGGCUUCGUCCUCGUCGUCG 
>ame-miR-9892 MIMAT0037297 Apis mellifera miR-9892 
UGACGCGAUUGUGGAAAUCG 
>ame-miR-9893 MIMAT0037298 Apis mellifera miR-9893 
UUAUGAUCUGGAAUACUAGG 
>ame-miR-9894 MIMAT0037299 Apis mellifera miR-9894 
GAGGGCGAGGAGAGGAGGAA 
>ame-miR-9895 MIMAT0037300 Apis mellifera miR-9895 
UCGUGUCCGUUUCUCGUUUCGA 
>ame-miR-9896 MIMAT0037301 Apis mellifera miR-9896 
ACAAUAAUCGGACACAAUCGGC 
>ame-miR-3478 MIMAT0037302 Apis mellifera miR-3478 
CACACCGGACGAGAUUUCAU 
>cre-miR9897-5p MIMAT0037303 Chlamydomonas reinhardtii miR9897-5p 
UACCGGGCGUGGGGAGGGCAGG 
>cre-miR9897-3p MIMAT0037304 Chlamydomonas reinhardtii miR9897-3p 
UUACGGCUCCUUCUUAUCGGC

output:

$ grep --no-group-separator -A 1 -f ids.txt test.txt | sed '/^>/ s/\(>\).*\(miR.*\)$/\1\2/'
>miR-9887 
AAGGGCUGGGAAGGGCGGAG 
>miR9897-5p 
UACCGGGCGUGGGGAGGGCAGG
ADD REPLY
0
Entering edit mode

In addition, in your code, there is sed '/^>/ s/(>).(miR.)$/\1\2/'. It should be sed '/^>/ s/\(>\).*\(miR.*\)$/\1\2/'. Is it copy/paste mistake or is the code problem?

ADD REPLY
0
Entering edit mode

I am sorry...Its just a copy paste mistake :(

ADD REPLY

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6