Question: entrez utilities: snp
0
gravatar for DanielC
2.5 years ago by
DanielC120
Canada
DanielC120 wrote:

Dear All,

I am trying to get the 3' and 5' UTR of the BRCA1 and BRCA2 mRNA. I came to know about entrez utilities to do it like this:

source: https://www.ncbi.nlm.nih.gov/books/NBK179288/

and the code is:

ThreePrimeUTRs() {
    xtract -pattern INSDSeq -ACC INSDSeq_accession-version -SEQ INSDSeq_sequence \
      -group INSDFeature -if INSDFeature_key -equals CDS -PRD "(-)" \
        -block INSDQualifier -if INSDQualifier_name \
          -equals product -PRD INSDQualifier_value \
        -block INSDFeature -pfc "\n" -element "&ACC" -rst \
          -last INSDInterval_to -element "&SEQ" "&PRD" |
    while read acc pos seq prd
    do
      if [ $pos -lt ${#seq} ]
      then
        echo -e ">$acc 3'UTR: $((pos+1))..${#seq} $prd"
        echo "${seq:$pos}" | fold -w 50
      elif [ $pos -ge ${#seq} ]
      then
        echo -e ">$acc NO 3'UTR"
      fi
    done
  }

  esearch -db nuccore -query "3.6.4.12 [ECNO]" |
  efilter -molecule mrna -source refseq |
  efetch -format gbc | ThreePrimeUTRs

When I run this I keep getting error saying;

**Unrecognized argument '-if'
No -element before 'INSDFeature_key'
Unrecognized argument '-equals'
No -element before 'CDS'**

Can someone please help me know what is going wrong? And, can I get the 5' UTR following the same code? And, finally, I also want to get the SNPs in the 3' and 5' UTR?

Thank you so much! DK

snp • 1.1k views
ADD COMMENTlink modified 2.5 years ago by Pierre Lindenbaum127k • written 2.5 years ago by DanielC120
1

If you put the code in a file, make it executable and run it, it produces a result.

>XM_005708748.1 3'UTR: 1918..1941 ATP-dependent DNA helicase RecQ
gtgtggttttcaacaagttttaca
>XM_005707168.1 3'UTR: 3700..3734 ATP-dependent DNA helicase RecQ
gttgctttgggtttcacaaggtaaatttatgacaa
>XM_005706277.1 3'UTR: 1799..1836 ATP-dependent DNA helicase 2 subunit 1 isoform 1
gaacggccagtatacaacacccagatcagccaaatcaa
>XM_005706276.1 3'UTR: 1409..1481 ATP-dependent DNA helicase 2 subunit 1 isoform 2
tccgtcaaaatattcggatcctgatattcaacgatattataacggattac
aagctctggctctgaatcaaacc
>XM_005705233.1 3'UTR: 1632..1709 ATP-dependent DNA helicase RecQ
ctttattgtatgagaattttctgaatttctttgcagacatttctttcgca
tgtatcttataaacaactataagattgt
>NM_001278454.1 3'UTR: 6111..8976 chromodomain-helicase-DNA-binding protein 2
agcgactgagaaggggggggggaaacacgtcttgaaagacttggatgcaa
caaccagaaactctgaacatgctgctatcatcttgctgggtcaaggagga
ttttggaggagcaggtggaggaagactcagttctaatttgggttcccatt
ttgtttccccccctttctctcgttgaacattggaaccagacttgcctcgt
tctttttctttggtttgttttccccaatccaacggacacgtggagaattt
tcctcagccacagtgtttccccaaaaccgagaaggcggatcaatgctgct

truncated for brevity.

You will need to change your query (e.g. -query "BRCA") to get what you need.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax80k

Thanks! but when I run I keep getting this error above

Unrecognized argument '-if' No -element before 'INSDFeature_key' Unrecognized argument '-equals' No -element before 'CDS'*

ADD REPLYlink written 2.5 years ago by DanielC120

Are you using the bash shell? If not, issue the command bash and then run the file at the new system prompt that should show up.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax80k

OK, Thanks. I will try and update here.

ADD REPLYlink written 2.5 years ago by DanielC120

Isn't this question same as SNPs; entrez utilities ?

ADD REPLYlink written 2.5 years ago by Sej Modha4.6k

Not exactly, because in that I had no idea of the approach. Here, I have found a way, but having errors and missing features to get the SNPs.

ADD REPLYlink written 2.5 years ago by DanielC120

I have it on good information that if you update your implementation of eutils, this should work fine.

ADD REPLYlink written 2.5 years ago by DCGenomics320
2
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

using XSLT , assuming it's mRNA, 5'->3', with the correct annotation. I'm extracting the position of the left and right CDS:

ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum127k

Hi Pierre, thank you so much! This xlst seems like a promising tool. Please help me understand a few queries:

a) "using XSLT , assuming it's mRNA, 5'->3', with the correct annotation. I'm extracting the position of the left and right CDS:"

I have never actually given thought to, whether mRNA 5' -> 3' or vice versa. I thought the file from which the UTRs are to be extracted are well annotated with all necessary information. Do I need to be careful when extracting such info from entrez?

b) I ran your script and it worked like charm, however, could you please help me understand briefly what is the script "transform.xls" doing? It will help me to have a clear idea.

c) Please let me know how you got the refseqids for this search here:

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=`NM_007299.3`&id=**NR_027676.1**&id=**NM_007299.3**&id=`NM_007298.3`&retmode=xml" | xsltproc --novalid transform.xsl - | fold -w 60

d) After I have got the 5' and 3' UTRs, I need to extract the SNPs and their positions,could you please share how this could be done using xslt?

Thanks much!

ADD REPLYlink written 2.5 years ago by DanielC120

a)

I thought the file from which the UTRs are to be extracted are well annotated

they're not: there is no CDS in NR_027676.1

b) I'm going to add some comment, please update in a few minutes

c) how you got the refseqids

i've just peeked a few randow mRNA accessions using entrez "mRNA BRCA1"

d) no, because that's not your original question. Ask a new question.

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum127k

Ok, thanks for the explanation. For the SNPs in 3' and 5' UTRs and their positions, I have asked a new question, please share the solution. Thank you :-)

SNPs in UTR

ADD REPLYlink written 2.5 years ago by DanielC120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1963 users visited in the last hour