I have a fasta file with 100+ viral genomes. I am only interested in looking at the sequence for one particular gene, for which I have the coordinates. I have tried bedtools getfasta tools as follows:
head UL48.bed JQ673480.1 103538 105080 head ref.fa >KT425109.1 ATAAACCAACGAAAAGCGCGGGAACGGGG.... bedtools getfasta -fi ref.fa -bed UL48.bed
I realize that the issue is the bedfile chromosome does not match that found in the ref.fa file. Each genome only has a unique identifier, however, so they will never match, and it would be extremely time-intensive to manually make a bedfile with each of the unique identifiers. I don't want to edit my ref.fa file to have uniform identifiers, however, because I need this information for downstream processes. Is there any way to use grep or a similar command line tool to do this? So, I need to grep each line beginning with > and the characters ~103538 to ~105080 from each entry in the ref.fa file.