I have the output from BLASTN searches and want to extract 2500 bases upstream and downstream of each BLASTN hit from an assembled genome.
I have generated fastas containing each BLASTN sequence, and have a fasta for the assembled genome.
I have been trying to use pcregrep for this:
pcregrep -i -A0 -B0 -M -f Blastn_hit.fna Assembled_genome.fna > Blastn_hit_+_bases.fna
However, there is no output.
I believe this is because the Blastn_hit.fna lines are longer than those in Assembled_genome.fna, so I have to indicate a new line using
(\n|.) in the BLASTN file. The only problem is I don’t know where the new lines are, and so don’t know where to enter
(\n|.) in Blastn_hit.fna. Is there a way to use pcregrep without indicating where new lines are, or is there an alternative tool or script I can use that will find the BLASTN hit and print 2500 bases upstream and downstream?
I am very new to this and have very limited knowledge, so answers with more of a ‘for dummies’ approach would be appreciated.
(I know that -A and -B will print lines, not characters, but I can work out how many characters there are to a line and so know how many lines should be printed)