Question: Merging Blastx Hits From Overlapping Bacterial Genome Segments
gravatar for Darked89
10.6 years ago by
Barcelona, Spain
Darked894.2k wrote:

I blastx-ed 1Mbp bacterial genome fragment against NCBI nr database. I have split it into 2000bp fragments with 500bp overlap into a one multiple fasta file (splitter from EMBOSS)

splitter -sequence my_contig.fa  -size 2000 -overlap 500

As on output I picked tabulated blast (-m 9).

Next step was to convert blastx output into gff3. Got that one, with absolute positions (positions in intact contig).

Seems that often one ORF / predicted gene is covered by 2-3 blast hits to the same protein. Hits may or may not overlap. Hence my questions:

  1. what are the fragment sizes / overlaps typically used for blastx in such situation?
  2. are there any advantages of improving blast hits, by say merging overlapping segments (e-scores will be invalid), or by using blast2 (blastx mode) and comparing DNA sequence from region of overlapping/almost-touching hits against already detected protein?
ADD COMMENTlink modified 2.0 years ago by RamRS30k • written 10.6 years ago by Darked894.2k
gravatar for Istvan Albert
10.6 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Isn't the size of the protein that causes multiple hits? No matter what fragment size or overlap you choose, if two or more fragments cover different sections of the same protein, you'll get mulitple hits.

If your fragment sizes are too large you'll miss regions, if they are too small you'll get multiple hits. This latter problem does not seem to preclude any downstream analysis, so it may not be worth trying to optimize it away.

ADD COMMENTlink written 10.6 years ago by Istvan Albert ♦♦ 84k

Seems that I am missing hits to some fragments, therefore I will have to go down in fragment size and increase the proportion of the overlap. Average predicted gene size is 274 aa, so I will try 1kb fragments with 500bp overlaps next.

ADD REPLYlink written 10.6 years ago by Darked894.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1646 users visited in the last hour