Find genes of interest in large unannotated genome files
1
0
Entering edit mode
3 months ago
▴ 10

I'm working on a gene in all fish (multi-copy, pseudogene, non-existent) and blast can solve part of the problem, but you know, the blast results sometimes don't match very well at the ends, but for a large number of Genome, I want to extend the bases based on the results of the blast for manual judgment. Can you tell me how to fix it, or what software to use? I would appreciate it if you could tell me

bioinformatics • 376 views
1
Entering edit mode
3 months ago

You could use tab-delimited output from BLAST (-output-fmt 6) as a BED file with bedops --range in BEDOPS, e.g.:

$bedops --range 10000 --everything in.bed > out.bed  ADD COMMENT 0 Entering edit mode Thank you very much for your suggestion. Now I use the blast output file in fmt6 format, and get the extended base positions at the beginning and end in bedops, but the sequence in the inbed file is not modified. Does bedops support modifying the sequence? Or do I need to use the output bed file to find my extended sequence in the fasta file? I would be very grateful if you could answer ADD REPLY 0 Entering edit mode Modify the BED intervals with bedops --range etc., and then follow this up with using samtools faidx (search on those two keywords) and indexed FASTA to retrieve the sequence over the modified intervals. I have a convenience script here: that uses indexed FASTA and samtools to convert BED lines to FASTA records. Example usage: $ perl bed2faidxsta.pl --options... < in.bed > out.fa


Replace --options ... with command-line options, as needed.