Is it possible to isolate and merge overlapping hits from a BLAST search Aligned Fasta file?
Entering edit mode
2.4 years ago
Leah ▴ 10

Hello all,

Hope you're well. I started a PhD in Jan this year and frankly I am struggling. I'm coming into the second week of trying to figure out how to attempt this problem and it's honestly starting to get to me a bit - the current task I'm working on is not my main project and every day I don't make progress is a delay on my actual project. So! A bit worried to say the least (!)

Here is the issue: I ran some sequences using BLASTn. I downloaded the Aligned Fasta file for each. I have sorted reads by if they are a single read (ie the accession number only appears once) or multiple reads of the same entry (example below, edited for brevity):





This is so I can use the seqkit to combine the multiple hit entries. Once I've done that, I'll be filtering based on query sequence length (using bioawk which I know how to do) and carrying on with the pipeline (alignment, analysis, ect.).

The problem is, some of these multiple hits are overlapping and I have no idea how to treat them! For example:



My two questions:

  1. Is there a way I could isolate these overlapping sequences from the nonoveralpping multiple hits? I just can't fathom how I can achieve that right now with my current skill set.

  2. Aside from using the EMBOSS merger command, is there another way for me to merge these overlapping files together? I did give it a go and it had wonderfully merged the above Elephant example correctly. But I am going to be dealing with thousands of hits, is it possible to write a python script to run merger through the overlapping files? I realise this is a tall order. I'd try to configure this script to use seqkit to merge the non-overlapping multiple hits too.

Thank you kindly for reading - I have trawled these forums the last few months and frankly I only think I've managed to get where I am thanks to your answers. Any advice, suggestions or critiques are gratefully received.

GenBank Fasta Linux Python BLAST • 1.5k views
Entering edit mode

Not sure if I understand what are you trying to do. But, if you get your blastn results as a table (outfmt 6) it would be easier to analyze them. You will be able to classify/filter blastn hits by evalue, score, etc. and also, to identify querys matching multiple subjects.

Entering edit mode

Thank you for the response Buffo - I used the online BLAST instead of command line. I did suggest to my supervisor about going about it the command line way but we went with online. I am kicking myself slightly as it does seem to be more commonly used.

It does sound like it would help but not necessarily solve the issue I am having which is how to treat these overlapping hits. While I'm reticent to recollect my data and set myself back a few weeks, if it means making this issue easier to solve, I'll give it a go. Thank you for taking the time out to answer

Entering edit mode
2.4 years ago
Jiyao Wang ▴ 360

I am not sure whether this will help. When you get the result page, a link near the top left allows you to download the search result in a JSON blob to be parsed.

Entering edit mode

Thank you kindly for your response Jiyao Wang - I've never come across JSON before but it looks pretty interesting. I am going to attempt to have a go another way but thank you for the suggestion and I appreciate you taking the time out to answer


Login before adding your answer.

Traffic: 1858 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6