I have a file containing all the protein fastas from my genome of interest (downloaded from NCBI):
I've been conducting some analysis and have a list of genes I'm interested in; however, I don't have the accession numbers, I have a list of partial gene IDs assigned by the people who did the annotation:
Is there a way I can search the protein fastas file for a list of proteins using information from the protein headers?
grep -f genes_of_interest.txt protein_fastas_file.faa > output.fa
in an attempt to at least pull out the headers so I can get the accession numbers but all it did was return every single protein fasta (ie. exactly the same file as
protein_fastas_file.faa). I'm assuming this is because the actual sequence part doesn't actually exist on a new line or something?
Thanks in advance for any help anyone can give!