Hello. I need some advice about how I can handle some BLAST results.
I have a set of Illumina reads obtained from a metagenomic sample. I need to obtain from these reads the ones belonging to a certain organism, while eliminating others. I prepared a BLAST database from the genomes (mitogenomes) of the organism I am interested in, plus 3 others I need to "screen-out". The query for the BLAST were the Illumina reads.
I have the results (in tabular output) and I need to keep the reads that are a match to my problem organism, and eliminate the ones that are also a (better) match to the other 3.
Any ideas on how can I achieve this? Thanks for your help.
Assuming you have BLAST output files in the standard order, the identifier of the queries (names of illumina reads) that matched the sequences in your database should be in the first column. What I would do is write a script (e.g. using perl or python) to run through your BLAST table(s) and keep track of the query identifiers that match your mitogenomes, check if these queries have matches with a higher bitscore (last column of the BLAST output) to the the genomes you wanna screen out, and finally go through the file with all the illumina reads and filter only the ones you are interested in. You also should decide what you wanna do with the reads that match none of the genomes in the database. Hope this helps.