Does The Ucsc Knowngenes Table Include All Types Of Rna: Coding Rna, Lncrna, Pseudogenes And Sncrna?
1
0
Entering edit mode
10.9 years ago
camelbbs ▴ 710

Can I ask which annotation is the complete list for current human genes.

I usually use UCSC knowngenes, but I see someone use Refseq+ENCODE/GENCODE as annotation. Does ucsc knowngenes include them? which one is more complete?

Thanks.

genes • 2.8k views
ADD COMMENT
1
Entering edit mode

Same question asked 1 day ago UCSC: which gene name annotation are you using?

ADD REPLY
0
Entering edit mode

Yes, I want to know if ucsc known genes include all types of RNA: coding RNA, lncRNA, pseudogenes and sncRNA.......

ADD REPLY
0
Entering edit mode

I don't know about UCSC, but GENCODE does include ncRNA's (lncRNA, small RNA, rRNA).

ADD REPLY
0
Entering edit mode

Have you tried asking the folks at UCSC: http://genome.ucsc.edu/contacts.html ?

ADD REPLY
0
Entering edit mode
10.9 years ago
camelbbs ▴ 710

OK. I list an answer I found:

The knownGene annotation is based on UniProt (SWISS-PROT and TrEMBL) for protein and on the NCBI Reference gene collection (RefSeq) and Genbank for mRNAs.

There are alignments to both chr17 and chr17_random probably due to duplicated regions in the assembly. chr17_random contains an unordered collection of sequence contigs that are known to be in chr17 but the exact position is unknown. It may be that this contig or part of it was falsely duplicated in chr17_random in the assembly process. It may be that this region containing the alignment of BC091770 should not be in chr17_random since the position of this region is already known in chr17. This may occur for other genes in known Genes but it should not be common. As the assembly improves with new releases, then this problem should occur less often.

There are two alignments for each location in chr17 and chr17_random since each one has a different CDS start, but the position of the alignment and the exons is the same in both cases. Each one has a separate alignment ID e.g. for chr17, they are G193465 and U34032. The G is for Genbank and U is for UCSC. We align the Known Gene mRNA to the SWISS-PROT protein sequence for that mRNA to determine the CDS region and compare it to that defined in the Genbank record. In some cases they are different. G193465 has the cdsStart that corresponds to the one in the Genbank record for BC091770 and U34032 has the cdsStart found by aligning the SWISS-PROT protein sequence to the mRNA sequence to find the CDS region.

ADD COMMENT

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6