Renaming .fa file genes back to its repeat masker file origin
3.2 years ago
I had a repeat masker bed file that I used to extract particular sections from a genome I am working with, using Bedtools. However, my output file (.fa) named the individual segments "unspecified" and then a number, so I can not relate the genes back to their coordinates from the repeat masker file. How can edit the output file to correlate with the original bed file? Or is there a way I can redo this so that it will give me a more specific output? Thank you

It would be helpful to debug/reproduce the error if you could specify the exact commands you used and also give an excerpt of your data.

I used:

bedtools getfasta -name -s -fi file.fa -bed rmsk.bed -fo GeneResults.fa


and an example of a data point was:

>rnd-4_family-798_Unspecified(-)
TTGATATCAGAAGGTTTTTCTATGCCTGTTTTGTAAAGGTGGATTTAATCTTGATATTAAAGGATTGTTTTTGCAAGCTTAAATGCGAACCTCATAGTGCGTAAACTGCATAGTTAAATTTACCATTGTTTCAAGCTTTCATGGTTTA