Extracting nucleotide sequences associated with specific GenPept annotated regions
Entering edit mode
11 months ago
alopex • 0

I'm looking for a way to extract the nucleotide sequences of NCBI GenBank records corresponding to specific annotated regions in the associated NCBI GenPept records (either manually, or ideally, programmatically using R package rentrez, FASTA format).

For example, this spike protein sequence has two regions annotated, corresponding to the S1 and S2 glycoproteins, that can be easily highlighted or isolated. But the corresponding nucleotide sequence GenBank entry doesn't feature that annotated region information, giving only the nucleotide sequence of the whole protein. Is there a way of cross-referencing these to only isolate the relevant sequence?

GenBank sequence GenPept rentrez ncbi • 337 views
Entering edit mode

I don't think you can retrieve the nucleotide sequence just for those regions. They are annotated as regions and AFAIK you can only retrieve nucleotide sequence of entire CDS.

$ esearch -db protein -query "QBP43268" | efetch -format ft
>Feature gb|QBP43268.1|
1   1352    Protein
            product S protein
234 721 Region
            region  Corona_S1
            note    Coronavirus S1 glycoprotein
            db_xref CDD:279880
729 1351    Region
            region  Corona_S2
            note    Coronavirus S2 glycoprotein
            db_xref CDD:279881
1   1352    CDS
            product S protein
            protein_id  gb|QBP43268.1|

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6