Is there any way to extract longest ORF from blastx output?
2
1
Entering edit mode
7.5 years ago
sohra ▴ 40

Hi all,

I read in a paper, the longest ORF in the reading frame indicated by the blastx analysis was determined, then resulting CDS extracted and also UTR regions were removed. Could anybody please let me know how to determine longest ORF using blastx results and find the CDS and UTR on them?

Thanks

CDS blast alignment sequence UTR • 2.2k views
0
Entering edit mode

What is the reference of that paper?

0
Entering edit mode

You can find it here

0
Entering edit mode
7.4 years ago
x.jack.min ▴ 20

http://proteomics.ysu.edu/tools/OrfPredictor.html

will do the work for you

0
Entering edit mode
7.4 years ago
5heikki 11k

I don't think it's possible to detect the longest possible ORF from blastx output, only the longest aligned region (although probably in most cases the latter is part of the former). Below 1) sort by query id; 2) sort by alignment length (tabular output assumed). Note that only the longest hit per contig is considered so this strategy is not that sensible for all data (e.g. contigs that are expected to include introns and or intergenic regions between CDS). If you're fine with this, you can output the translated region into a column (check blastx -help), and then parse it from there..

LC_ALL=C; export LANG=C; sort -k1,1 -k4,4gr tabularBlastxOutput | sort -u -k1,1 --merge > longestAlignedRegions