Question: Qpcr: How And What Softwares You Used To Get All Transcripts Of One Gene?
gravatar for Cheng Zhongshan
9.6 years ago by
Cheng Zhongshan400 wrote:

Dear all, now I want to design primers for all transcripts of one gene, such as CD55 (GeneId:1604), unfortunately, I can not use perl to parse all exon and intron positions about all of the transcripts from NCBI in a simple way. I really want to know what softwares and how your guys to extract all the transcripts of the gene CD55 in a simple perl code, or anyother programming codes, such as python, R? Thanks a lot!

perl transcript • 2.3k views
ADD COMMENTlink modified 9.6 years ago • written 9.6 years ago by Cheng Zhongshan400
gravatar for Pierre Lindenbaum
9.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

I wouldn't use Genbank to retrieve the positions of the exons as some records can be poorly annotated.

You can use the mysql database of the UCSC to get the genomic positions of the exons and then retrieve the DNA sequences using fastacmd or UCSC/DAS-DNA

mysql --user=genome -A -D hg19 -P 3306   -e 'select exonStarts,exonEnds from knownGene as K,kgXref as X where geneSymbol="CD55" and'
    | exonStarts                                                                                                     | exonEnds                                                                                                       |
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207513735,207532890, | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207513853,207534309, | 
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207532890,           | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207534309,           | 
    | 207494816,207495726,207498966,207500096,207504452,207510037,207510673,207512741,207532890,                     | 207495210,207495912,207499066,207500182,207504641,207510163,207510754,207512762,207534309,                     | 
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207513735,           | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207514081,           | 
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207527351,207532890, | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207527444,207534309, | 
ADD COMMENTlink modified 12 months ago by RamRS30k • written 9.6 years ago by Pierre Lindenbaum130k

That's great! you really give me good suggestions. I will use perl to run mysql and get all my target transcripts.

ADD REPLYlink written 9.6 years ago by Cheng Zhongshan400
gravatar for Ryan W.
9.6 years ago by
Ryan W.120
United States
Ryan W.120 wrote:

Funny, I was just working on something related to this today. I use the NCBI E-utilities (I know, awesome name, right?). Specifically, I was requesting FASTA formatted mrna transcripts in my Java application by using their "EFetch" utility at the following URL:

The great thing about this is that it's language agnostic since you're just issuing HTTP requests and parsing responses. Once I make a request and get the data back, I just parse the sequence out of the FAST-formatted response and I'm off to the races.

I'm not exactly sure how to query a list of all transcripts for a given gene but I imagine you can use their "ESearch" utility to get that data. I just download their one big gene2accession file ( and that tells me all the transcripts available for all genes.

Documentation for NCBI E-Utilities:

ADD COMMENTlink written 9.6 years ago by Ryan W.120

The ID cheng used is not a GI but a GeneId. You cannot use ncbi-efetch to retrieve the sequence. Moreover the records in genbank can be poorly annotated and won't always contain the positions of the exons.

ADD REPLYlink written 9.6 years ago by Pierre Lindenbaum130k

Yeah, that's why I said to use the ESearch utility. You can use the GeneId as a parameter in a query to get a list of transcripts. As for the the quality of the annotations, I'll defer to you.

ADD REPLYlink written 9.6 years ago by Ryan W.120

Yeah, the perl module Mutipride actually uses ESearch to fetch the DNA sequence of one gene by GeneID and extract information, such as exon and intron postions in the genbank file. Unfortunately, only some of the exon and intron positions are pointed out in the genbank file. That's why I want to change the codes of Mutipride and get all the transcripts of one gene first and design qPCR primers in a pipeline.

ADD REPLYlink written 9.6 years ago by Cheng Zhongshan400
gravatar for Cheng Zhongshan
9.6 years ago by
Cheng Zhongshan400 wrote:

Thanks! you suggestions are impressive, actually, I want to design qPCR primers to differing all the transcripts, i.e, I need to know all the position of exon and intron for all the transcripts, and make caculation of the right primer pairs for different transcripts.

I known perl module MultiPride can design qPCR primers for many genes in a pipeline, which use perl to parse NCBI gene informations, including the gene's dna location, mRNA exon start and end, but I find it is difficult to maintain the perl script, especially when a gene has multiple transcripts because of lack of some exon and intron information for all transcripts of one gene in NCBI, such as CD55, which has more than 6 transcripts, but only 3 recorded in the genbank file (GI: 1604 ) of CD55, so how can I use perl to get the exon and intron's start and stop position on the corresponding DNA of all the transcripts? Thanks again!

ADD COMMENTlink modified 9.6 years ago • written 9.6 years ago by Cheng Zhongshan400
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour