Question: How To Concatenate Blast Results (M8) Via Setting Threshold Of Distance Between Two Query Hits
6.1 years ago by
hi, dear guys

I performance blastn (-m 8) using a query file of many sequences, and for each query sequence, the output contains many fragmental hits of significance.

however, these hits have no overlap, and what is interesting is that most gaps < 300bp (much shorter than full-length of the query sequence).

so, how can i concatenate those closely related hits into one via setting a value (e.g 300bp) when these hits match the same subject (different regions), ——also to reduce the number of output hits per query.

for example:

are there any scripts or tools for this purpose?

all your replies are welcome!

blast • 1.9k views
ADD COMMENTlink modified 2.4 years ago by Lhl730 • written 6.1 years ago by xiongtl201340
6.1 years ago by
You can use Biopython to parse the blast output and then concatenate the sequences that match your criteria. I would not recommend parsing the tabular output though, instead re-run blast and get the results in xml format since the that is easier to parse using a script.

In the biopython tutorial the chapters you would be interested in are 3, 4, 5, and 7.

ADD COMMENTlink written 6.1 years ago by jgibbons150
2.4 years ago by
United States
Have you tried genBlastA/G ? (She et al., 2011) genBlastG: using BLAST searches to build homologous gene models. Bioinformatics.

ADD COMMENTlink written 2.4 years ago by Lhl730
