Hi all,
I've been using entrez_fetch
in R package rentrez
v1.2.2 to extract nucleotide sequences in FASTA format for a large number of GIDs. For a small minority I've found entrez_fetch
simply returns an empty string with a newline character - example below.
> entrez_fetch(db = "nuccore", id = "108597802", rettype="fasta_cds_na")
[1] "\n"
I get the same result using the accession rather than the GID.
> entrez_fetch(db = "nuccore", id = "DQ640652.1", rettype="fasta_cds_na")
[1] "\n"
The exact function works for most other GIDs/accessions I feed it, and it also works if I request alternative rettypes, e.g.
> entrez_fetch(db = "nuccore", id = "108597802", rettype="gb")
[1] "LOCUS DQ640652 29746 bp RNA linear VRL 12-JUN-2006\nDEFINITION SARS coronavirus GDH-BJH01, complete genome.\nACCESSION DQ640652\nVERSION DQ640652.1\nKEYWORDS .\nSOURCE SARS coronavirus GDH-BJH01\n ORGANISM SARS coronavirus GDH-BJH01\n Viruses; Riboviria; Nidovirales; Cornidovirineae; Coronaviridae;\n Orthocoronavirinae; Betacoronavirus; Sarbecovirus.\nREFERENCE 1 (bases 1 to 29746)\n AUTHORS Cai,J.-P., Hei,A.-L., Hu,J.-H., Wang,S.-K., Zhang,C.-B., Dai,D.-P.,\n Shen,Z.-Y., Guo,J., Li,M., Wu,Y.-S., Cheng,G., He,Y.-S. and Hou,M.\n TITLE Direct Submission\n JOURNAL Submitted (14-MAY-2006) National Center for Clinical Laboratory,\n Beijing Hospital, 1 Da Hua Road, Dong Dan, Beijing 100730, China\nFEATURES Location/Qualifiers\n source 1..29746\n /organism=\"SARS coronavirus GDH-BJH01\"\n /mol_type=\"genomic RNA\"\n /strain=\"GDH-BJH01\"\n /isolation_source=\"Homo sapiens lung\"\n /host=\"Homo sapiens\"\n /db_xref=\"taxon:388737\"\n /country=\"China\"\nORIGIN \n 1 ggcttccagg aaaagccaac
Curiously though using the API through a browser also returns a blank file: example.
If anyone is able to shed some light on why these sequences aren't being returned in FASTA format properly, I'd be very grateful!
DQ640652
is genome of SARS virus. It does not look like there are any annotations included in the GenBank file. Perhaps that is the reason for not getting anything back when you ask for CDS sequences.Ah you're right! I hadn't noticed that - many thanks! Is there any metadata field that can help me identify and filter out the accessions without CDS annotations?