We are trying to extract protein sequences from the fasta file found in NCBI. However, the IDs we have are the transcripts IDs instead of gene IDs. Is there an easy way to get the geneIDs from the fasta file using the transcript IDs that we have?
For example I have this protein sequence in fasta file format:
then I have the transcript ID: ENSGALG00000042750.1 ENSGALG00000032142.1
However, we need to get the geneID from the fasta file using the transcript ID.
example: for ENSGALG00000042750.1 >> ENSGALP00000056694.1 for ENSGALG00000032142.1 >> ENSGALP00000046506.1
So far, we are manually putting the geneID in a txt file using the transcript ID, however, the data is almost 4,000 which means we need to manually encode 4,000 geneIDs from the transcript ID.