BLASTn: extract unmatched regions only
1
1
Entering edit mode
2.4 years ago
A_heath ▴ 160

Hi all,

I have a custom database and I used it to BLASTn against a bacterial genome. I would like to extract the unmatched regions only.

Is there a command line or another way to do it?

Thanks very much for your precious help!

BLASTn • 976 views
ADD COMMENT
3
Entering edit mode
2.4 years ago
  1. output in outfmt 6
  2. convert to BED/GTF/GFF
  3. bedtools complement
  4. bedtools getfasta
ADD COMMENT
1
Entering edit mode

Thank you very much shenwei356 for your reply.

So if I understood correctly, bedtools complement will output the coordinates of the regions that are not covered by a hit and bedtools getfasta will extract the fasta sequences?

edit: I managed to convert my Blastn result file in a .bed file. Now I'm stuck with bedtools complement as I need to input a genome file (-g). Indeed, in my customed database I dowloaded multiple contigs (117,136 contigs to be exact). The genome file is required to be a 2-column file with the name of the contigs alongside their size in bp. Is there a way to design a genome file for bedtool with this many contigs?

ADD REPLY
2
Entering edit mode

many contigs?

In one file or many files?

For one or a few contigs files:

seqkit fx2tab -l -i *.fasta | cut -f 1,4 > genome.txt

For many contigs files:

seqkit fx2tab -l -i --infile-list <(find -name "*.fasta") | cut -f 1,4 > genome.txt
ADD REPLY
1
Entering edit mode

Thank you very much, it worked great to obtain a proper genome file with 2 columns!

However, now I have another issue with bedtools complement as it returns an error saying that the .bed file contains contigs out of order. I thought about using bedtools sort but is this appropriate for this type of file?

ADD REPLY

Login before adding your answer.

Traffic: 1437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6