BLAST results analysis
1
0
Entering edit mode
7.7 years ago
sukesh1411 ▴ 30

Hi

How to find the number of contigs that do not match with any sequence of nucleotide database during blast search. Sorry for incomplete question. The output file format is in tab delimited i.e txt file. I could not find any answers in the online search.

Thanks

blast • 2.7k views
ADD COMMENT
1
Entering edit mode

This is another example of a question on Biostars that does not contain enough information to get an answer the first time around. You should include what format your blast output is in (since there are so many) at a minimum. Have you done any effort via a simple web search to see if a solution is already available?

I am going to close this question until you add this information to your original post (use the edit option on original post). We will open the question back up once you do that.

ADD REPLY
0
Entering edit mode

Hello sukesh1411!

Please provide complete description/additional information.

For this reason we have closed your question.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

Many blast output formats are tab-delimited text. Can you post a snippet of example (or tell us what -outfmt number you used)?

ADD REPLY
1
Entering edit mode
7.7 years ago
Sej Modha 5.3k

That is simple. Depending upon the format of your blast output, extract the headers of the sequences that have at least one hit to blast nucleotide db. Extract the headers from the contig fasta file and use reverse grep to find the sequences that do not match anything.

If file1 is the file with the blast headers and file2 is the file with all contig headers then you can run

grep -vf file1 file2
ADD COMMENT
0
Entering edit mode

Also assuming the OP is on Linux, and is only interested in the sequences hitting or not: (This would be col. 1 of a '-outfmt 6'.) Pull the query ids out (and double check unique ids) with cut -f 1 blastoutput | sort | uniq > contighits.txt Then pull out the contig headers: grep -e '>' contigs.fa > contigids.txt && sed -i 's/>//g' contigids.txt; then you can use @Sej Modha grep line to get the differences: grep -vf contigids.txt contighits.txt.

ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6