Question: Refseqgene Directly To Gtf?
1
gravatar for Gabe Rudy
5.7 years ago by
Gabe Rudy310
Golden Helix
Gabe Rudy310 wrote:

UCSC says they take nighlty dumps of the mRNA sequences from RefSeqGens and BLAT them against their assemblies to generate the RefSeqGenes track.

NCBI does have a mappings file on their FTP site: ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/GCF_000001405.25_refseqgene_alignments.gff3

But this only provides a start/stop position along the assembly, not the CDS start/stop and exon boundries.

Is there any way to take NCBI's FTP data and generate a proper GTF file or another location on NCBI that has this data in one form or another?

gtf ncbi genes • 4.2k views
ADD COMMENTlink modified 5.3 years ago • written 5.7 years ago by Gabe Rudy310
3
gravatar for Gabe Rudy
5.3 years ago by
Gabe Rudy310
Golden Helix
Gabe Rudy310 wrote:

I finally got an answer from Deanna Church on this in response to a blot post we wrote about variant annotation.

Here is the GRCh37 GFF mappings of RefSeqGenes:

ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ref_GRCh37.p13_top_level.gff3.gz

GFF3 can be a bit tricky to parse and get to the same fields as GTF, but all the data is there.

Notice also microRNAs are also in there.

ADD COMMENTlink written 5.3 years ago by Gabe Rudy310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3538 users visited in the last hour