Question: Extract flanking region of -500 nt upstream and downstream of BLAST result on genome using perl scripts
0
gravatar for adhirajnath14
6 months ago by
adhirajnath1420 wrote:

I have performed BLAST on a set of 380 sequences on a genome using BLAST+. I need to extract the aligned region along with (-/+)500 nt flanks for each hit from the genome. Can anyone provide me a script in perl or guide a way to do it. I know it can be done but have not found any reliable sources. I am quite new to this field.

ADD COMMENTlink modified 6 months ago by jean.elbers1.3k • written 6 months ago by adhirajnath1420
1

Get the coordinates in BED format, then use bedtools slop to extend by 500bp followed by (if you need that) bedtools getfasta to get the sequence.

ADD REPLYlink written 6 months ago by ATpoint29k
1
gravatar for jean.elbers
6 months ago by
jean.elbers1.3k
jean.elbers1.3k wrote:

Not tested

makeblastdb -in reference.fa -dbtype nucl
samtools faidx reference.fa
cut -f 1,2 reference.fa.fai > chrom.sizes
blastn -db reference -query query.fasta -outfmt 6 -out query.fasta-against-reference.blast
cut -f 2,9,10 query.fasta-against-reference.blast | awk '{if($2 > $3) {t = $2; $2 = $3; $3 = t; print;} else if($2 < $3) print; }' OFS='\t' |awk '{a=$2-1;print $1,a,$3;}' OFS='\t'|bedtools sort > query.fasta-against-reference.blast.bed
bedtools slop -i query.fasta-against-reference.blast.bed -g chrom.sizes -b 500 > query.fasta-against-reference.blast.500.bed
bedtools getfasta -fi reference.fa -bed query.fasta-against-reference.blast.500.bed -fo query.fasta-against-reference.blast.500.fasta

explanation

makeblastdb -in reference.fa -dbtype nucl # make a BLAST database
samtools faidx reference.fa # make a FASTA index
cut -f 1,2 reference.fa.fai > chrom.sizes # get the chromosome sizes
blastn -db reference -query query.fasta -outfmt 6 -out query.fasta-against-reference.blast # perform BLAST (note use your desired settings)
cut -f 2,9,10 query.fasta-against-reference.blast | awk '{if($2 > $3) {t = $2; $2 = $3; $3 = t; print;} else if($2 < $3) print; }' OFS='\t' |awk '{a=$2-1;print $1,a,$3;}' OFS='\t'|bedtools sort > query.fasta-against-reference.blast.bed # convert BLAST 6 format to BED
bedtools slop -i query.fasta-against-reference.blast.bed -g chrom.sizes -b 500 > query.fasta-against-reference.blast.500.bed # add on 500 bases upstream and downstream
bedtools getfasta -fi reference.fa -bed query.fasta-against-reference.blast.500.bed -fo query.fasta-against-reference.blast.500.fasta # get the accompanying bases for BLAST coordinates but 500 bases upstream and downstream
ADD COMMENTlink modified 6 months ago • written 6 months ago by jean.elbers1.3k

Thank you so much. I'll use it and let you know if any problems

ADD REPLYlink written 6 months ago by adhirajnath1420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour