BLASTn: extract unmatched regions only
1
1
16 months ago
A_heath ▴ 120

Hi all,

I have a customed database and I used it to BLASTn against a bacterial genome. I would like to extract the unmatched regions only.

Is there a command line or another way to do it?

Thanks very much for your precious help!

3
16 months ago
1. output in outfmt 6
2. convert to BED/GTF/GFF
3. bedtools complement
4. bedtools getfasta
1
So if I understood correctly, bedtools complement will output the coordinates of the regions that are not covered by a hit and bedtools getfasta will extract the fasta sequences?

edit: I managed to convert my Blastn result file in a .bed file. Now I'm stuck with bedtools complement as I need to input a genome file (-g). Indeed, in my customed database I dowloaded multiple contigs (117,136 contigs to be exact). The genome file is required to be a 2-column file with the name of the contigs alongside their size in bp. Is there a way to design a genome file for bedtool with this many contigs?

2
many contigs?

In one file or many files?

For one or a few contigs files:

seqkit fx2tab -l -i *.fasta | cut -f 1,4 > genome.txt


For many contigs files:

seqkit fx2tab -l -i --infile-list <(find -name "*.fasta") | cut -f 1,4 > genome.txt

1
Thank you very much, it worked great to obtain a proper genome file with 2 columns!

However, now I have another issue with bedtools complement as it returns an error saying that the .bed file contains contigs out of order. I thought about using bedtools sort but is this appropriate for this type of file?