entrez utilities: snp
1
0
Entering edit mode
6.6 years ago
DanielC ▴ 170

Dear All,

I am trying to get the 3' and 5' UTR of the BRCA1 and BRCA2 mRNA. I came to know about entrez utilities to do it like this:

source: https://www.ncbi.nlm.nih.gov/books/NBK179288/

and the code is:

ThreePrimeUTRs() {
    xtract -pattern INSDSeq -ACC INSDSeq_accession-version -SEQ INSDSeq_sequence \
      -group INSDFeature -if INSDFeature_key -equals CDS -PRD "(-)" \
        -block INSDQualifier -if INSDQualifier_name \
          -equals product -PRD INSDQualifier_value \
        -block INSDFeature -pfc "\n" -element "&ACC" -rst \
          -last INSDInterval_to -element "&SEQ" "&PRD" |
    while read acc pos seq prd
    do
      if [ $pos -lt ${#seq} ]
      then
        echo -e ">$acc 3'UTR: $((pos+1))..${#seq} $prd"
        echo "${seq:$pos}" | fold -w 50
      elif [ $pos -ge ${#seq} ]
      then
        echo -e ">$acc NO 3'UTR"
      fi
    done
  }

  esearch -db nuccore -query "3.6.4.12 [ECNO]" |
  efilter -molecule mrna -source refseq |
  efetch -format gbc | ThreePrimeUTRs

When I run this I keep getting error saying;

**Unrecognized argument '-if'
No -element before 'INSDFeature_key'
Unrecognized argument '-equals'
No -element before 'CDS'**

Can someone please help me know what is going wrong? And, can I get the 5' UTR following the same code? And, finally, I also want to get the SNPs in the 3' and 5' UTR?

Thank you so much! DK

SNP • 2.4k views
ADD COMMENT
1
Entering edit mode

If you put the code in a file, make it executable and run it, it produces a result.

>XM_005708748.1 3'UTR: 1918..1941 ATP-dependent DNA helicase RecQ
gtgtggttttcaacaagttttaca
>XM_005707168.1 3'UTR: 3700..3734 ATP-dependent DNA helicase RecQ
gttgctttgggtttcacaaggtaaatttatgacaa
>XM_005706277.1 3'UTR: 1799..1836 ATP-dependent DNA helicase 2 subunit 1 isoform 1
gaacggccagtatacaacacccagatcagccaaatcaa
>XM_005706276.1 3'UTR: 1409..1481 ATP-dependent DNA helicase 2 subunit 1 isoform 2
tccgtcaaaatattcggatcctgatattcaacgatattataacggattac
aagctctggctctgaatcaaacc
>XM_005705233.1 3'UTR: 1632..1709 ATP-dependent DNA helicase RecQ
ctttattgtatgagaattttctgaatttctttgcagacatttctttcgca
tgtatcttataaacaactataagattgt
>NM_001278454.1 3'UTR: 6111..8976 chromodomain-helicase-DNA-binding protein 2
agcgactgagaaggggggggggaaacacgtcttgaaagacttggatgcaa
caaccagaaactctgaacatgctgctatcatcttgctgggtcaaggagga
ttttggaggagcaggtggaggaagactcagttctaatttgggttcccatt
ttgtttccccccctttctctcgttgaacattggaaccagacttgcctcgt
tctttttctttggtttgttttccccaatccaacggacacgtggagaattt
tcctcagccacagtgtttccccaaaaccgagaaggcggatcaatgctgct

truncated for brevity.

You will need to change your query (e.g. -query "BRCA") to get what you need.

ADD REPLY
0
Entering edit mode

Thanks! but when I run I keep getting this error above

Unrecognized argument '-if' No -element before 'INSDFeature_key' Unrecognized argument '-equals' No -element before 'CDS'*

ADD REPLY
0
Entering edit mode

Are you using the bash shell? If not, issue the command bash and then run the file at the new system prompt that should show up.

ADD REPLY
0
Entering edit mode

OK, Thanks. I will try and update here.

ADD REPLY
0
Entering edit mode

Isn't this question same as SNPs; entrez utilities ?

ADD REPLY
0
Entering edit mode

Not exactly, because in that I had no idea of the approach. Here, I have found a way, but having errors and missing features to get the SNPs.

ADD REPLY
0
Entering edit mode

I have it on good information that if you update your implementation of eutils, this should work fine.

ADD REPLY
2
Entering edit mode
6.6 years ago

using XSLT , assuming it's mRNA, 5'->3', with the correct annotation. I'm extracting the position of the left and right CDS:

ADD COMMENT
0
Entering edit mode

Hi Pierre, thank you so much! This xlst seems like a promising tool. Please help me understand a few queries:

a) "using XSLT , assuming it's mRNA, 5'->3', with the correct annotation. I'm extracting the position of the left and right CDS:"

I have never actually given thought to, whether mRNA 5' -> 3' or vice versa. I thought the file from which the UTRs are to be extracted are well annotated with all necessary information. Do I need to be careful when extracting such info from entrez?

b) I ran your script and it worked like charm, however, could you please help me understand briefly what is the script "transform.xls" doing? It will help me to have a clear idea.

c) Please let me know how you got the refseqids for this search here:

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=`NM_007299.3`&id=**NR_027676.1**&id=**NM_007299.3**&id=`NM_007298.3`&retmode=xml" | xsltproc --novalid transform.xsl - | fold -w 60

d) After I have got the 5' and 3' UTRs, I need to extract the SNPs and their positions,could you please share how this could be done using xslt?

Thanks much!

ADD REPLY
0
Entering edit mode

a)

I thought the file from which the UTRs are to be extracted are well annotated

they're not: there is no CDS in NR_027676.1

b) I'm going to add some comment, please update in a few minutes

c) how you got the refseqids

i've just peeked a few randow mRNA accessions using entrez "mRNA BRCA1"

d) no, because that's not your original question. Ask a new question.

ADD REPLY
0
Entering edit mode

Ok, thanks for the explanation. For the SNPs in 3' and 5' UTRs and their positions, I have asked a new question, please share the solution. Thank you :-)

SNPs in UTR

ADD REPLY

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6