In theory you could download this file from the COG FTP site:
wget -O cog.txt "ftp://ftp.ncbi.nih.gov/pub/COG/COG/myva=gb" head -5 cog.txt # APE0180 14600509 # APE0225 14600543 # APE0277 14600591 # APE0307 14600619 # APE0324 14600631
The second column contains protein GIs. You could then write a script using NCBI EUtils, to link the protein GI with, for example, Gene GI:
curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=protein&db=gene&id=19076072" http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd"> <eLinkResult> <LinkSet> <DbFrom>protein</DbFrom> <IdList> <Id>19076072</Id> </IdList> <LinkSetDb> <DbTo>gene</DbTo> <LinkName>protein_gene</LinkName> <Link> <Id>2539562</Id> </Link> </LinkSetDb> </LinkSet> </eLinkResult>
Parse the output, get the list of Gene GIs and submit to Batch Entrez to retrieve nucleotide sequence.
The problem: COG has not been updated in years, so many of the protein GIs are now retired. So you'll have to either work around that or perhaps, not use COG - it is very outdated and not maintained.