Question

Find genes of interest in large unannotated genome files

0

Entering edit mode

2.1 years ago

雨 ▴ 20

I'm working on a gene in all fish (multi-copy, pseudogene, non-existent) and blast can solve part of the problem, but you know, the blast results sometimes don't match very well at the ends, but for a large number of Genome, I want to extend the bases based on the results of the blast for manual judgment. Can you tell me how to fix it, or what software to use? I would appreciate it if you could tell me

genes • 860 views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 2.1 years ago by 雨 ▴ 20

score 1 · Answer 1 · 2022-03-22

1

Entering edit mode

2.1 years ago

Alex Reynolds 35k

You could use tab-delimited output from BLAST (-output-fmt 6) as a BED file with bedops --range in BEDOPS, e.g.:

$ bedops --range 10000 --everything in.bed > out.bed

Ref.: https://github.com/bedops/bedops

ADD COMMENT • link 2.1 years ago by Alex Reynolds 35k

0

Entering edit mode

Thank you very much for your suggestion. Now I use the blast output file in fmt6 format, and get the extended base positions at the beginning and end in bedops, but the sequence in the inbed file is not modified. Does bedops support modifying the sequence? Or do I need to use the output bed file to find my extended sequence in the fasta file? I would be very grateful if you could answer

ADD REPLY • link 2.1 years ago by 雨 ▴ 20

0

Entering edit mode

Modify the BED intervals with bedops --range etc., and then follow this up with using samtools faidx (search on those two keywords) and indexed FASTA to retrieve the sequence over the modified intervals. I have a convenience script here: that uses indexed FASTA and samtools to convert BED lines to FASTA records.

Example usage:

$ perl bed2faidxsta.pl --options... < in.bed > out.fa

Replace --options ... with command-line options, as needed.

ADD REPLY • link 2.1 years ago by Alex Reynolds 35k