Find the best hit of heterologous protein in blastx results
1
0
Entering edit mode
9.8 years ago
arronar ▴ 280

Hello!

Lets take the story from the beginning.

I am running a blastx for the query (contig) above using only the Oryza taxa and evalue 3.

Contig:

>Contig375
CGGGGATCTGAATGGACTTCTCTCATTTCTACCAGCATGCTGGTGGGAATCTTGTATATATAGAGATTTG
ACAATCAAGTAAGAAGTTTAAATAATTTGTAGCTTTCTTTTGTAATGCATACTTTTATCGATACCTAGAA
AAAATTACGTTTAGATCACTTATTAGAGTGACATTGTTGTCATACATTGGATGTTTATAAACCTGATGAT
CTGTTTGCATATTCCTGAACCAATGCCCCAAAGAGTGAGGGCTTCTCAATCAAACGTGAAGGCTTGTCAA
ATTCTTTTGCATACCCTGCATCAATGACTAAAACCCGATCACAGTCCATGACAGTAGGTATCCTATGAGC
TATGCTAACGATGGTA

As you can see (if you run the same job) it returns a numerous hits as a result and the first is the one with the smallest evalue.

So what I want is to get as first result the sequence with the above characteristics:

  1. Its length to be as greater as possible. e.g in our example the first hit has length of 251 while the second one has a length of 1278 amino acids.

  2. To be as possible near to the 5' end. By this i mean to be closer to the first amino acid (methionine) e.g in our example some hits start from the 20th amino acid while others start from the 1200th.

In a nutshell I want to filter the results of blastx to return me, as bigger (in length) as possible protein but in the same time that sequence to be close or identical to the beginning of the protein.

So is there any way to filter the results in such a way ? Or maybe there is another database rather than this of NCBI to search for more completed protein sequences .

Thank you.

hit blastx heterologous-proteins • 2.2k views
ADD COMMENT
2
Entering edit mode
9.8 years ago

The following XSLT sort a XML output of blastx on Hit/Hit_len and then on Hsp/Hsp_hit_from

Usage:

xsltproc --novalid blastsort.xsl blastx.xml
ADD COMMENT
0
Entering edit mode

Are you sure that this works ? I am running it and it returns me back the results in the same order.

Here is my initial XML file.

ADD REPLY
0
Entering edit mode

ah yes, sorry, I forgot the attribute "data-type="number"' . I updated the code

xsltproc --novalid stylesheet.xsl blastx.xml | grep -E '(Hit_len|Hsp_query\-from)'

  <Hit_len>1489</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
      <Hsp_query-from>17</Hsp_query-from>
  <Hit_len>1468</Hit_len>
      <Hsp_query-from>2</Hsp_query-from>
      <Hsp_query-from>23</Hsp_query-from>
  <Hit_len>1451</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1444</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1441</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1441</Hit_len>
      <Hsp_query-from>5</Hsp_query-from>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>1356</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
      <Hsp_query-from>17</Hsp_query-from>
  <Hit_len>1199</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
      <Hsp_query-from>293</Hsp_query-from>
  <Hit_len>763</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
  <Hit_len>517</Hit_len>
      <Hsp_query-from>8</Hsp_query-from>
ADD REPLY

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6