I can't figure out how to use the "-match" syntax even after reading all the documentation I could find. I get these errors:
$ cat xml.txt | xtract -pattern Gene-commentary -match Gene-commentary_type:1
ent '-match' No -element before 'Gene-commentary_type:1'
$cat xml.txt | xtract -element Gene-commentary -match Gene-commentary_type:1
Unrecognized argument '-match' No -element before 'Gene-commentary_type:1'
What am I doing wrong?
What I am trying to do is pull the accession of the reference sequence and the coordinates for the region for a given entry in NCBI Gene (see https://www.biostars.org/p/122522/) so that I can run $ efetch -format FASTA -seqstart -seqend and get the appropriate results.
I could parse the XML in python to do it, but it really seems like I should be able to do this in "one line" using entrez direct if only I could get "-match" to work :/
Here is what the XML looks like:
Say I have a gene record in XML
epost -db gene -id 672 | efetch -format xml > xml.txt
According to the outline,
cat xml.txt | xtract -outline
<Gene-commentary> <Gene-commentary_type value="genomic">1</Gene-commentary_type> <Gene-commentary_heading>Reference assembly</Gene-commentary_heading> <Gene-commentary_label>RefSeqGene</Gene-commentary_label> <Gene-commentary_accession>NG_005905</Gene-commentary_accession> <Gene-commentary_version>2</Gene-commentary_version> <Gene-commentary_seqs> <Seq-loc> <Seq-loc_int> <Seq-interval> <Seq-interval_from>92500</Seq-interval_from> <Seq-interval_to>173688</Seq-interval_to>
I have read:
http://www.ncbi.nlm.nih.gov/books/NBK179288/ (I followed these instructions to install it, so $which epost returns ~/edirect/epost)