Question

Extract country information of a fasta sequence on NCBI website using renterz

0

Entering edit mode

3.4 years ago

kelvinfrog75 ▴ 10

I want to extract the country location of a bunch of sequences in the NCBI using renterz. I have their accession numbers but I have the trouble of getting the country info. For example, I have this accession number MH939154 and I need to extract Romania using rentrez.

 source          1..10976
                 /organism="West Nile virus"
                 /mol_type="genomic RNA"
                 /strain="DD84c"
                 /host="Culex pipiens s.l."
                 /db_xref="taxon:11082"
                 /country="Romania"
                 /collection_date="2014"
                 /note="lineage 2"

I have tried the code below but it seems like it will only extract the countries related to publication. So I wonder if there is any way to get the country under the source.

id = "MH939154.1"
db = entrez_fetch(db= "pubmed", id = id, rettype = "xml")
xml <- read_xml(db)
recs <- xml_find_all(xml, "//Country")

R NCBI rentrez location • 1.3k views

ADD COMMENT • link updated 3.4 years ago by GenoMax 141k • written 3.4 years ago by kelvinfrog75 ▴ 10

1

Entering edit mode

3.4 years ago

GenoMax 141k

This can also be obtained by using Entrez Direct:

$ esearch -db nuccore -query "MH939154" | esummary | xtract -pattern DocumentSummary -element SubName
DD84c|Culex pipiens s.l.|WNV|Romania|2014|lineage 2

4th field is Country. I will leave it for you to extract that.

ADD COMMENT • link updated 3.4 years ago by rpolicastro 13k • written 3.4 years ago by GenoMax 141k

score 2 · Accepted Answer · 2020-12-04

2

Entering edit mode

3.4 years ago

Pierre Lindenbaum 161k

i don't know r+xml , so using a XPATH expression:

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=MH939154&rettype=gb&retmode=xml"  | xmllint  --xpath '//GBQualifier[GBQualifier_name="country"]/GBQualifier_value/text()' - && echo

Romania

ADD COMMENT • link 3.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

This seems to work fine. I can run this command inside R. Just wonder how do you get this link "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=MH939154&rettype=gb&retmode=xml" ? Thanks.

ADD REPLY • link 3.4 years ago by kelvinfrog75 ▴ 10

0

Entering edit mode

Just wonder how do you get this link

https://www.ncbi.nlm.nih.gov/books/NBK25500/

ADD REPLY • link 3.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Great. I am able to integrate the command script and get the country info. Thanks!

ADD REPLY • link 3.4 years ago by kelvinfrog75 ▴ 10