Question: why the same gene is located at different chromosomes ?
1
gravatar for winter_li
3.6 years ago by
winter_li60
winter_li60 wrote:

HI , I got all human gene region file from UCSC http://genome.ucsc.edu/cgi-bin/hgTables. I find that the same gene is located at different chromosomes , like

   585     NR_106918       **chr1**    -       17368   17436   17436   17436   1       17368,  17436,  0       **MIR6859-1**       unk     unk     -1,

   1367    NR_106918       **chr15**   +       102513726       102513794       102513794       102513794       1       102513726,      102513794,      0       **MIR6859-1**       unk     unk     -1,

the MIR6859-1 gene is at both chr1 and chr15, why ????what happened ???

rna-seq next-gen genome gene • 1.8k views
ADD COMMENTlink modified 20 months ago by tdmurphy190 • written 3.6 years ago by winter_li60

Also on chr16:

at chr16:67052-67119 - (NR_106918)
at chr15:102513727-102513794 - (NR_106918)
at chr1:17369-17436 - (NR_106918)

Remarkable

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by WouterDeCoster43k

I have exactly the same question, for example, OR4F3 gene, encoding Olfactory receptor 4F3/4F16/4F29 protein. I found if using NCBI hg38.gtf, it only locates in chr5, but if using ucsc.genes.gtf or encode.hg38.gtf, it locates in both chr1 and chr5.

grep -w "OR4F3" UCSC/hg38/Annotation/Genes/genes.gtf
chr1    unknown exon    450740  451678  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown stop_codon  450740  450742  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown CDS 450743  451678  .   -   0   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown start_codon 451676  451678  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P26249"; transcript_id "NM_001005224"; tss_id "TSS18830";
chr1    unknown exon    685716  686654  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr1    unknown stop_codon  685716  685718  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr1    unknown CDS 685719  686654  .   -   0   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr1    unknown start_codon 686652  686654  .   -   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P6827"; transcript_id "NM_001005224_1"; tss_id "TSS33796";
chr5    unknown CDS 181367287   181368222   .   +   0   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
chr5    unknown exon    181367287   181368225   .   +   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
chr5    unknown start_codon 181367287   181367289   .   +   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
chr5    unknown stop_codon  181368223   181368225   .   +   .   gene_id "OR4F3"; gene_name "OR4F3"; p_id "P25445"; transcript_id "NM_001005224_2"; tss_id "TSS8523";
ADD REPLYlink modified 20 months ago by finswimmer13k • written 20 months ago by hudiejie0

Hello hudiejie and welcome to biostars,

please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

This reply is better suited as a comment on the original question. Answers in biostars are meant only for (full) solutions to the problem of the OP. This is why I moved your answer to a comment.

Thank you!

fin swimmer

ADD REPLYlink modified 20 months ago • written 20 months ago by finswimmer13k
3
gravatar for Denise - Open Targets
3.6 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets5.1k wrote:

Entries such as NRs are not genes (loci). They are RNA sequence for a non-coding locus. If I search for MIR6859-1 in UCSC I get one entry only under known genes.

ADD COMMENTlink written 3.6 years ago by Denise - Open Targets5.1k
2
gravatar for Satyajeet Khare
3.6 years ago by
Satyajeet Khare1.5k
Pune, India
Satyajeet Khare1.5k wrote:

It looks like the sequence is perfect match (just tried BLAT search) on both Chr1, Chr15, Chr16. But the gene IDs are different [Mir6859-1 (Chr1), -2 (Chr1), -3 (Chr15) and -4 (Chr16)] on Entrez. NR IDs also appear to be different on NCBI (NR_106918, NR_107062, NR_107063, NR_128720). If you are getting only one NR ID, It could an annotation issue.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Satyajeet Khare1.5k
2
gravatar for tdmurphy
20 months ago by
tdmurphy190
tdmurphy190 wrote:

UCSC has two types of RefSeq tracks. The old "RefSeq Genes" or refgene track is based on alignments generated by UCSC, and can't distinguish between different locations with the same sequence. The newer "NCBI RefSeq" tracks are based on annotation imported from NCBI's RefSeq project, which uses additional information to distinguish ambiguous locations, as well as some other differences and including additional features and genes not available in the refgene track. UCSC posted a blog about it: http://genome.ucsc.edu/blog/the-new-ncbi-refseq-tracks-and-you/

For the microRNAs, the four identical locations are assigned separate identifiers by miRBase, HGNC, and NCBI Gene, and each location has a separate RefSeq NR transcript. The same is true for some protein-coding genes, such as CALM1, CALM2, and CALM3.

ADD COMMENTlink written 20 months ago by tdmurphy190

Thank you for your answer! So it is better to use ucbi.hg38.gtf to avoid some ambiguous locations for the same gene. How about miRNAs, I am also interested which database should I use (I guess I should use the NCBI one as well if using ncbi.hg38.gtf)?

ADD REPLYlink written 20 months ago by hudiejie0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1646 users visited in the last hour