Question: Specify Assembly for NCBI Entrez Query
2
gravatar for paulparsons
5.4 years ago by
paulparsons130
Canada/London/Western
paulparsons130 wrote:

Does anyone know if it is possible to specify which assembly to use when constructing a query for Entrez ?

For example, if I do such a query with EDirect:

esearch -db gene -query "brca1 [ALL]human[ORGN]" -sort "relevance" | \

efetch -format docsum | \

xtract -pattern DocumentSummary -element Name MapLocation Description OtherAliases Id \

  -block GenomicInfo -element ChrLoc ChrAccVer ChrStart ChrStop

 

the GenomicInfo that I get seems to be according to the GRCh38 assembly. The application that I'm developing needs to use the GRCh37 assembly, however.

 

Any help is much appreciated.

entrez ncbi • 3.0k views
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by paulparsons130
5
gravatar for paulparsons
5.4 years ago by
paulparsons130
Canada/London/Western
paulparsons130 wrote:

Actually, it turns out that this can be done. I wrote to NCBI and got a response.

The key is to look in the LocationHistType block for:

- a specific annotation release (see here and here for some explanation of annotation releases). For example, GRCh37.p13 is coded by NCBI as Annotation Release 105.

- the corresponding assembly accession, which is a RefSeq Assembly ID. For GRCh37.p13 it is GCF_000001405.25 (see here for more information). 

The EDirect commands should look something like this:

   esearch -db gene -query "brca1 [ALL] AND human [ORGN]" | \

   efetch -format docsum | \
   xtract -pattern DocumentSummary \

     -element Name MapLocation Description OtherAliases Id ChrLoc \
     -block LocationHistType \
       -match "AnnotationRelease:105" -and "AssemblyAccVer:GCF_000001405.25" \
         -element ChrAccVer ChrStart ChrStop

ADD COMMENTlink written 5.4 years ago by paulparsons130

That's very valuable to know. Thanks for sharing what you found out!

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 81k

Good to know. Technically, this is "retrieve everything and parse for version" rather than "query using version", but whatever works.

ADD REPLYlink written 5.4 years ago by Neilfws48k

Good point. In my case (although maybe not in all cases), the difference doesn't really matter, and I can get the information that I need. Cheers.

ADD REPLYlink written 5.4 years ago by paulparsons130
0
gravatar for Neilfws
5.4 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I don't have a reference or source, but I'm certain that EUtils uses the current build, with no option to use previous versions. This came up in a previous question Ncbi Esearch :Searching For Snp/Genes/... At In A Given Segment Chr:Start-End ?.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Neilfws48k

ha, there are some interesting dependencies there that I have never considered before, anyone that relies on eutils is tied to the current release

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1450 users visited in the last hour