Question: Protein coding annotation from UCSC
0
gravatar for tim.ivanov.92
26 days ago by
tim.ivanov.9210 wrote:

How do i download only protein-coding genes from UCSC table browser? I've chosen gencode v19 and genes and gene prediction in settings bars, but i'm getting in the result much more genes, than just protein coding

I'm attaching here screenshots with my settings image

and the head of resulting dataframe

image

ucsc annotation • 110 views
ADD COMMENTlink modified 26 days ago by Luis Nassar110 • written 26 days ago by tim.ivanov.9210

You seem to have pasted the link for image hosting site twice. Please go under "Embed codes" tab and use the full image html link and paste here.

ADD REPLYlink modified 26 days ago • written 26 days ago by genomax68k

yes, sry, fixed it in the original post - now the second picture is the right one

ADD REPLYlink written 26 days ago by tim.ivanov.9210
2
gravatar for lshepard
26 days ago by
lshepard340
United States
lshepard340 wrote:

One way is to choose "selected fields from primary and related tables", click the linked table and "allow selection from checked tables", selects "geneType/BioType of gene" (and anything else relevant to you), save the results and then subset/filter your file to contain rows which match protein_coding (a simple grep could do this).

I am sure there is more than one solution to the above, but this would work.

ADD COMMENTlink written 26 days ago by lshepard340

Thank you, that does it!

ADD REPLYlink written 25 days ago by tim.ivanov.9210
0
gravatar for Luis Nassar
26 days ago by
Luis Nassar110
Luis Nassar110 wrote:

Hello Tim,

With your same selections on the Table Browser, if you change the output format to BED, you will see additional options to refine the output. The following page will say:

Create one BED record per:

Which by default makes an entry for the Whole Gene. You can instead designate just Exons, or only Coding Exons to exclude 3'/5' UTRs. The output will be in BED format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1).

There are many answers to this question depending on exactly what you are looking for. If you have additional questions I would encourage you to look at our mailing list archives (https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome), or write in directly to the help desk (genome@soe.ucsc.edu).

p.s. If you are looking for a more concise gene list, you may use the UCSC Genes track for hg19, then select the knownCanonical table. This data table only has a single isoform for each gene.

ADD COMMENTlink written 26 days ago by Luis Nassar110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour