Question: Why Some Refseq Genes Have Several Transcripts With Different Strands?
3
gravatar for tflutre
8.8 years ago by
tflutre510
tflutre510 wrote:

Hello,

among RefSeq annotations (hg19, from UCSC), one can find genes with several transcripts on different strands. One also finds genes with the same identifier but on different chromosomes. Here is an example combining the two (gene "OR4F29"):

wget --timestamping ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
zcat refGene.txt.gz | grep "OR4F29"
587    NM_001005221    chr1    +    367658    368597    367658    368597    1    367658,    368597,    0    OR4F29    cmpl    cmpl    0,
589    NM_001005221    chr1    -    621095    622034    621095    622034    1    621095,    622034,    0    OR4F29    cmpl    cmpl    0,
1964    NM_001005221    chr5    +    180794287    180795226    180794287    180795226    1    180794287,    180795226,    0    OR4F29    cmpl    cmpl    0,
  • Are these errors, knowing that Ensembl has only one transcript for this gene?

  • What do you do in such a case? Discard the genes? Use Ensembl instead?

gene refseq strand • 6.1k views
ADD COMMENTlink modified 3.3 years ago by Biostar ♦♦ 20 • written 8.8 years ago by tflutre510

Ensembl has listed those three positions as separate genes - http://asia.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=3;q=OR4F29 - each with a single transcript of the exact same sequence. Personally I find RefSeq more... useful in this case

ADD REPLYlink modified 9 months ago by RamRS27k • written 8.8 years ago by Aaron Statham1.1k
5
gravatar for David Quigley
8.8 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

Grab the sequence from NCBI. BLAT the sequence. Note the results:

   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END SPAN
   ---------------------------------------------------------------------------------------------------
   browser details NM_001005221.2   939     1   939   939 100.0%     1   -     621096    622034    939
   browser details NM_001005221.2   939     1   939   939 100.0%     5   +  180794288 180795226    939
   browser details NM_001005221.2   939     1   939   939 100.0%     1   +     367659    368597    939

(With lots more matches, but those are the three with 100% identity for the full length of the sequence).

There are multiple locations listed for this sequence because it appears at several loci in the genome.

ADD COMMENTlink written 8.8 years ago by David Quigley11k

David's last sentence is telling - this gene/transcript would be labeled as a likely repeat element, most probably a transcribed portion of an LTR or retro-element. Thus, I would do as Travis suggests and give such a gene some kind of alternate consideration/label.

ADD REPLYlink written 8.8 years ago by Larry_Parnell16k
4
gravatar for Travis
8.8 years ago by
Travis2.8k
USA
Travis2.8k wrote:

I personally prefer the Ensembl approach in cases like this i.e. consider them as separate genes. Another option is to consider them as one gene but as sense/antisense representations of the gene. It depends a lot on preference and your own way of conceptualizing the genome/transcriptome. I would not discard any of the transcripts.

ADD COMMENTlink written 8.8 years ago by Travis2.8k

@Travis I chose your answer as I will also use Ensembl for the moment, but mainly because the tool I am using doesn't allow to have several transcripts on different strands for the same gene.

ADD REPLYlink written 8.8 years ago by tflutre510
3
gravatar for Casey Bergman
8.8 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Trans-splicing can also cause some gene models to (legitimately) be annotated to both strands, such as the Drosophila Mod(mdg4) locus: http://genome.cshlp.org/content/13/10/2220.full

The first indication of a requirement for trans-splicing in the generation of Mod(mdg4) proteins came after the realization that the two DNA strands of the gene have coding capabilities and contain coding sequences present in mature mRNAs that are translated into functional proteins

ADD COMMENTlink written 8.8 years ago by Casey Bergman18k

@Casey thanks for this good reference! For the other users, the initial paper (Labrador et al, Nature 2001) can be found here http://www.nature.com/nature/journal/v409/n6823/full/4091000a0.html

ADD REPLYlink written 8.8 years ago by tflutre510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1433 users visited in the last hour