Find genes of interest in large unannotated genome files
1
0
Entering edit mode
3 months ago
▴ 10

I'm working on a gene in all fish (multi-copy, pseudogene, non-existent) and blast can solve part of the problem, but you know, the blast results sometimes don't match very well at the ends, but for a large number of Genome, I want to extend the bases based on the results of the blast for manual judgment. Can you tell me how to fix it, or what software to use? I would appreciate it if you could tell me

bioinformatics • 375 views
ADD COMMENT
1
Entering edit mode
3 months ago

You could use tab-delimited output from BLAST (-output-fmt 6) as a BED file with bedops --range in BEDOPS, e.g.:

$ bedops --range 10000 --everything in.bed > out.bed

Ref.: https://github.com/bedops/bedops

ADD COMMENT
0
Entering edit mode

Thank you very much for your suggestion. Now I use the blast output file in fmt6 format, and get the extended base positions at the beginning and end in bedops, but the sequence in the inbed file is not modified. Does bedops support modifying the sequence? Or do I need to use the output bed file to find my extended sequence in the fasta file? I would be very grateful if you could answer

ADD REPLY
0
Entering edit mode

Modify the BED intervals with bedops --range etc., and then follow this up with using samtools faidx (search on those two keywords) and indexed FASTA to retrieve the sequence over the modified intervals. I have a convenience script here: that uses indexed FASTA and samtools to convert BED lines to FASTA records.

Example usage:

$ perl bed2faidxsta.pl --options... < in.bed > out.fa

Replace --options ... with command-line options, as needed.

ADD REPLY

Login before adding your answer.

Traffic: 2016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6