Question: Extract longest ORF
0
gravatar for waqaskhokhar999
6 months ago by
waqaskhokhar99980 wrote:

I have input file contains multiple nucleotide sequences in fasta format. I am using a standalone Linux version of ORFfinder using following command:

/Path/ORFfinder -in /Path_to_input_file/input_file.fasta -s 0 -out output_fasta -outfmt 0

This generates output file containing all ORF, but I am only interested in the longest ORF (ORF having maximum length). For example, one of the nucleotide sequence from my input file is:

BnaA03g18710D ACCAACATCTATTTTCCATCTTTTCCGATCAAAATCTCTCTCTCTCTCTCAGCTTTTTGTGTGACGCAACACTCGTGGGGAAATGGCCGCCGCAGTTTCCACCGTCGGTGCCATCAACAGAGCTCCGTTGAGCTTGAACGGGTCAGGAGCAGGAGCTGCTTCAGTCCCAGCTACGACCTTCTTGGGAAAGAAAGTTGTAACCGCGTCGAGATTCACACAGAGCAACAACAAGAAGAGCAACGGATCATTCAAAGTGGTCGCTGTCAAAGAAGACAAACAAACCGATGGAGACAGATGGAGGGGACTTGCCTACGACACGTCTGATGATCAACAAGACATCACCAGAGGCAAAGGTATGGTTGACTCTGTCTTCCAAGCTCCCATGGGAACCGGAACTCACAATGCCGTTCTTAGCTCCTATGAGTACATTAGCCAAGGTCTTAAGCAGTACAACTTGGACAACATGATGGATGGGCTTTACATTGCTCCTGCATTCATGGACAAGCTTGTTGTTCACATCACCAAGAACTTCTTGACTTTACCTAACATCAAGGTTCCACTTATTTTGGGTATTTGGGGAGGCAAAGGTCAAGGTAAATCCTTCCAGTGTGAGCTTGTCATGGCCAAGATGGGCATTAACCCAATCATGATGAGTGCTGGAGAGCTTGAGAGTGGAAACGCAGGAGAACCAGCCAAGCTGATCCGTCAAAGGTACCGTGAAGCAGCAGACATGATCAAAAAGGGAAAAATGTGTTGTCTATTCATCAACGATCTCGACGCTGGTGCTGGTCGTATGGGTGGTACTACTCAGTACACAGTCAACAACCAGATGGTTAACGCAACCCTCATGAACATTGCTGATAACCCAACCAACGTCCAGCTCCCGGGAATGTACAACAAGGAAGAAAACGCACGTGTCCCCATCATCGTCACCGGTAACGATTTCTCCACTCTCTACGCACCTCTCATCCGTGACGGGCGTATGGAGAAATTCTACTGGGCACCCACACGTGAGGACCGTATTGGTGTCTGCAAGGGTATCTTCAGGACTGATAACGTTAAGGATGAAGACATTGTCACGCTTGTTGACCAGTTCCCTGGACAATCTATCGATTTCTTTGGTGCATTGAGGGCGAGAGTGTACGATGATGAAGTGAGGAAGTTCGTTGAGGGACTTGGAGTTGAGAAGATAGGAAAGAGGCTGGTGAACTCTAGGGAAGGTCCTCCAGTGTTCGAGCAGCCAGCGATGACTCTTGAGAAGCTTATGGAGTACGGAAACATGCTTGTGATGGAACAAGAGAACGTCAAGAGAGTCCAACTTGCTGACCAATACCTTAACGAGGCTGCCTTGGGAGACGCAAACGCGGACGCCATTGGCCGCGGAACTTTCTATGGGAAAGCAGCACAGCAAGTGAACCTCCCTGTTCCAGAAGGGTGTACTGATCCTCAAGCAGACAACTTTGATCCAACAGCTAGAAGTGATGATGGAACTTGTGTCTACAACTTTTGAGTTTCCCCTTTGTTAAGTTGCTGTGTTTCTACTACTGTCTCTTTTTTTTGTTGCCTTTTGTGTAATTTTGGATTGCTTCATGTACTCTCTTTTTTTGTGATCATGTGCAAACATTAATATTGTAAGATTCCCTTGTCATAAACCATTTCTCAACTTTTTGTTTGCTTTATTAAGTAGATGGCATTCCAACTATAGTTCTTTGGCCATAGTCTCGGAA

This generates following output in the form of 5 ORF:

lcl|ORF1_BnaA03g18710D:82:1509 unnamed protein product

MAAAVSTVGAINRAPLSLNGSGAGAASVPATTFLGKKVVTASRFTQSNNKKSNGSFKVVAVKEDKQTDGD RWRGLAYDTSDDQQDITRGKGMVDSVFQAPMGTGTHNAVLSSYEYISQGLKQYNLDNMMDGLYIAPAFMD KLVVHITKNFLTLPNIKVPLILGIWGGKGQGKSFQCELVMAKMGINPIMMSAGELESGNAGEPAKLIRQR YREAADMIKKGKMCCLFINDLDAGAGRMGGTTQYTVNNQMVNATLMNIADNPTNVQLPGMYNKEENARVP IIVTGNDFSTLYAPLIRDGRMEKFYWAPTREDRIGVCKGIFRTDNVKDEDIVTLVDQFPGQSIDFFGALR ARVYDDEVRKFVEGLGVEKIGKRLVNSREGPPVFEQPAMTLEKLMEYGNMLVMEQENVKRVQLADQYLNE AALGDANADAIGRGTFYGKAAQQVNLPVPEGCTDPQADNFDPTARSDDGTCVYNF

lcl|ORF2_BnaA03g18710D:284:466 unnamed protein product

METDGGDLPTTRLMINKTSPEAKVWLTLSSKLPWEPELTMPFLAPMSTLAKVLSSTTWTT

lcl|ORF3_BnaA03g18710D:1623:1522 unnamed protein product

MFAHDHKKREYMKQSKITQKATKKRDSSRNTAT

lcl|ORF4_BnaA03g18710D:1373:993 unnamed protein product

MASAFASPKAASLRYWSASWTLLTFSCSITSMFPYSISFSRVIAGCSNTGGPSLEFTSLFPIFSTPSPST

NFLTSSSYTLALNAPKKSIDCPGNWSTSVTMSSSLTLSVLKIPLQTPIRSSRVGAQ

lcl|ORF5_BnaA03g18710D:926:807 unnamed protein product

MMGTRAFSSLLYIPGSWTLVGLSAMFMRVALTIWLLTVY

But I am only interested in Longest ORF which is:

lcl|ORF1_BnaA03g18710D:82:1509 unnamed protein product

MAAAVSTVGAINRAPLSLNGSGAGAASVPATTFLGKKVVTASRFTQSNNKKSNGSFKVVAVKEDKQTDGD RWRGLAYDTSDDQQDITRGKGMVDSVFQAPMGTGTHNAVLSSYEYISQGLKQYNLDNMMDGLYIAPAFMD KLVVHITKNFLTLPNIKVPLILGIWGGKGQGKSFQCELVMAKMGINPIMMSAGELESGNAGEPAKLIRQR YREAADMIKKGKMCCLFINDLDAGAGRMGGTTQYTVNNQMVNATLMNIADNPTNVQLPGMYNKEENARVP IIVTGNDFSTLYAPLIRDGRMEKFYWAPTREDRIGVCKGIFRTDNVKDEDIVTLVDQFPGQSIDFFGALR ARVYDDEVRKFVEGLGVEKIGKRLVNSREGPPVFEQPAMTLEKLMEYGNMLVMEQENVKRVQLADQYLNE AALGDANADAIGRGTFYGKAAQQVNLPVPEGCTDPQADNFDPTARSDDGTCVYNF

How can I output or extract only the longest ORF?

Any help will be highly appreciated.

rna-seq • 290 views
ADD COMMENTlink modified 6 months ago by Fatima600 • written 6 months ago by waqaskhokhar99980
0
gravatar for Fatima
6 months ago by
Fatima600
United states
Fatima600 wrote:

These might help:

C: Trouble Finding ORFs in DNA Sequence

How to extract the longest orf?

You might be able to modify this script to pick the longest ORF among those with the similar header (ORF_ID)

https://stackoverflow.com/questions/29953448/python-finding-longest-sequence-from-fasta-file

ADD COMMENTlink modified 6 months ago • written 6 months ago by Fatima600
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 716 users visited in the last hour