Trying to download gencode v29 basic annotation without splice variants
1
0
Entering edit mode
4.8 years ago

I am trying to download the annotation track I see on the UCSC genome browser called gencode v29 basic. I have the option selected where it does not show the splice variants. I was wondering how to download this specific track as I have tried downloading gencode v29 basic and gencode v30 basic from both UCSC table browser and the gencode website. Both have splice variants. I have tried using the known canonical modifier in table browser and it looks like it has the right positions, but I am only getting I file that has the positions without strands and transcript ids. I can deal with the transcript id's but I need it to be strand specific.

I assume this task is trivial and I am simply looking in all of the wrong places. So, if you happen to know how to do this or could lead me in the right direction, I would be very thankful!

Thank you!

genome annotation • 1.0k views
ADD COMMENT
2
Entering edit mode
4.8 years ago
GenoMax 141k

Get the annotation from GENCODE directly from this page.

If you specifically want release 29 then that is available here.

If you only need entries for "genes" then do the following:

awk -F "\t" '$3 ~ /gene/ {print $0}' gencode.v30.basic.annotation.gtf  > v30_genes

To get genes on + strand:

awk -F "\t" '($3 ~/gene/ && $7 ~/+/) {print $0}' gencode.v30.basic.annotation.gtf  > v30_plus

If you only need protein_coding genes then do:

awk -F "\t" '$3 ~/gene/ {print $0}' gencode.v30.basic.annotation.gtf | grep "protein_coding"  > v30_prot_coding
ADD COMMENT
0
Entering edit mode

Thank you very much for the reply. I have actually already done this and it leads to tracks that are slightly off. For example, C12orf49 is annotated to have a chromEnd of 116738070 in the downloaded gencode v29 basic annotation but is 116738061 on the UCSC genome browser track. Gencode v30 basic gives similar results that are not identical to the genome browser track. Another good example is NDUFA12. The UCSC displayed track has a chromStart of 94971332 but the downloaded annotation has 94897055.

I really need it to be exactly equal to the displayed track. When I upload the downloaded annotations to the genome browser to see if they are identical, they obvious differ in certain areas. Do you have any ideas as to how the genome browser is modifying the gencode v29 basic annotation to what they are displaying?

Thank you very much for the help!!

ADD REPLY
1
Entering edit mode

I suggest that you send a ticket in to UCSC browser help desk (email: genome at soe.ucsc.edu ) with this question. Someone from UCSC periodically stops by here but they are not regulars.

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6