Hello everybody, is everything okay with all of you?
After months of searching and frustrating scripts in Python that I wrote, I'm here asking for help!
From a TBLASTN output result, I've been trying to select the best hit (chosen as the one with the lowest evalue) from a particular genomic region (within a scaffold, contig etc), but doing manually takes a lot of time!
Is anyone knows a way can speed up this manual process in a automate way?
P.S. the output has over 300.000 results that I pass through an awk command that returns only the results over 750 nucleotide length and puts the evalues in a increasing order that makes selecting Best hits easier.
P.S. for a given subject acc.ver (known as the scaffold, contig etc id) there are a lot of hits per genomic region, so I'm not trying to select the best hit per scaffold, contig etc, but I'm trying to select all the best hits per non overlapped genomic region within each one of the subject acc.ver.
I read an old post from goubert.clement (BLAT: how to select best hit at one genomic position? (queries are repeats)) and Amitm suggested using Bowtie for he's particular problem with repetitive sequences, but I don't know if it can resolve mine...
Thank you in advance :)