Question: Extract flanking region of -500 nt upstream and downstream of BLAST result on genome using perl scripts
0
gravatar for adhirajnath14
6 days ago by
adhirajnath140 wrote:

I have performed BLAST on a set of 380 sequences on a genome using BLAST+. I need to extract the aligned region along with (-/+)500 nt flanks for each hit from the genome. Can anyone provide me a script in perl or guide a way to do it. I know it can be done but have not found any reliable sources. I am quite new to this field.

ADD COMMENTlink modified 6 days ago by jean.elbers1.2k • written 6 days ago by adhirajnath140
1

Get the coordinates in BED format, then use bedtools slop to extend by 500bp followed by (if you need that) bedtools getfasta to get the sequence.

ADD REPLYlink written 6 days ago by ATpoint21k
1
gravatar for jean.elbers
6 days ago by
jean.elbers1.2k
jean.elbers1.2k wrote:

Not tested

makeblastdb -in reference.fa -dbtype nucl
samtools faidx reference.fa
cut -f 1,2 reference.fa.fai > chrom.sizes
blastn -db reference -query query.fasta -outfmt 6 -out query.fasta-against-reference.blast
cut -f 2,9,10 query.fasta-against-reference.blast | awk '{if($2 > $3) {t = $2; $2 = $3; $3 = t; print;} else if($2 < $3) print; }' OFS='\t' |awk '{a=$2-1;print $1,a,$3;}' OFS='\t'|bedtools sort > query.fasta-against-reference.blast.bed
bedtools slop -i query.fasta-against-reference.blast.bed -g chrom.sizes -b 500 > query.fasta-against-reference.blast.500.bed
bedtools getfasta -fi reference.fa -bed query.fasta-against-reference.blast.500.bed -fo query.fasta-against-reference.blast.500.fasta

explanation

makeblastdb -in reference.fa -dbtype nucl # make a BLAST database
samtools faidx reference.fa # make a FASTA index
cut -f 1,2 reference.fa.fai > chrom.sizes # get the chromosome sizes
blastn -db reference -query query.fasta -outfmt 6 -out query.fasta-against-reference.blast # perform BLAST (note use your desired settings)
cut -f 2,9,10 query.fasta-against-reference.blast | awk '{if($2 > $3) {t = $2; $2 = $3; $3 = t; print;} else if($2 < $3) print; }' OFS='\t' |awk '{a=$2-1;print $1,a,$3;}' OFS='\t'|bedtools sort > query.fasta-against-reference.blast.bed # convert BLAST 6 format to BED
bedtools slop -i query.fasta-against-reference.blast.bed -g chrom.sizes -b 500 > query.fasta-against-reference.blast.500.bed # add on 500 bases upstream and downstream
bedtools getfasta -fi reference.fa -bed query.fasta-against-reference.blast.500.bed -fo query.fasta-against-reference.blast.500.fasta # get the accompanying bases for BLAST coordinates but 500 bases upstream and downstream
ADD COMMENTlink modified 6 days ago • written 6 days ago by jean.elbers1.2k

Thank you so much. I'll use it and let you know if any problems

ADD REPLYlink written 4 days ago by adhirajnath140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1105 users visited in the last hour