gtf file for canFam2 genome version
1
I'm trying to find out gtf file for this version of the canine I looked both ncbi as well as ucsc. I am not able to find the gtf file.
Here when I try to download I don't see the option to download the gtf file
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002285.2/
Normally in ucsc there is a folder called genes as we see in case of hg19 https://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/genes/
where it contains the gtf file but that is not present in canFam2 ucsc.
https://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/
Is there a way which I can find already created gtf for the same version which is canFam2 either from ncbi or ucsc ?
It would be helpful to know if that is possible to download
gtffile
• 477 views
I wrote https://jvarkit.readthedocs.io/en/latest/KgToGff/
It was just a one-shot, I don't have used it much. Please check the results.
$ wget -qO - "https://hgdownload.cse.ucsc.edu/goldenPath/canFam2/database/ensGene.txt.gz" | gunzip -c |\
java -jar dist/jvarkit.jar kg2gff --gtf | head -n 20
chr29 ucsc gene 31843577 31869014 . + . ID "GENE2"; Name "ENSCAFG00000008510"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; gene_type "protein_coding";
chr29 ucsc transcript 31843577 31869014 . + . ID "ENSCAFT00000013501.3"; Parent "GENE2"; Name "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; transcript_name "ENSCAFT00000013501";
chr29 ucsc exon 31843577 31843766 . + . ID "ENSCAFT00000013501%3AE0"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE0";
chr29 ucsc exon 31862157 31862334 . + . ID "ENSCAFT00000013501%3AE1"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE1";
chr29 ucsc exon 31865271 31865385 . + . ID "ENSCAFT00000013501%3AE2"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE2";
chr29 ucsc exon 31868495 31868660 . + . ID "ENSCAFT00000013501%3AE3"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE3";
chr29 ucsc exon 31868849 31869014 . + . ID "ENSCAFT00000013501%3AE4"; Parent "ENSCAFT00000013501.3"; Name "ENSCAFT00000013501"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3"; exon_id "ENSCAFT00000013501%3AE4";
chr29 ucsc CDS 31843577 31843766 . + 0 ID "CDS4"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc CDS 31862157 31862334 . + 2 ID "CDS5"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc CDS 31865271 31865385 . + 1 ID "CDS6"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc CDS 31868495 31868660 . + 0 ID "CDS7"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc CDS 31868849 31868910 . + 2 ID "CDS8"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc three_prime_utr 31868911 31869014 . + . ID "UTR9"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc start_codon 31843577 31843579 . + . ID "codon10"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr29 ucsc stop_codon 31868908 31868910 . + . ID "codon11"; Parent "ENSCAFT00000013501.3"; biotype "protein_coding"; gene_id "GENE2"; gene_name "ENSCAFG00000008510"; transcript_id "ENSCAFT00000013501.3";
chr3 ucsc gene 72230308 72416756 . + . ID "GENE13"; Name "ENSCAFG00000015634"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; gene_type "protein_coding";
chr3 ucsc transcript 72230308 72416756 . + . ID "ENSCAFT00000024802.14"; Parent "GENE13"; Name "ENSCAFT00000024802.14"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; transcript_name "ENSCAFT00000024802";
chr3 ucsc exon 72230308 72230403 . + . ID "ENSCAFT00000024802%3AE0"; Parent "ENSCAFT00000024802.14"; Name "ENSCAFT00000024802"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; exon_id "ENSCAFT00000024802%3AE0";
chr3 ucsc exon 72257459 72257619 . + . ID "ENSCAFT00000024802%3AE1"; Parent "ENSCAFT00000024802.14"; Name "ENSCAFT00000024802"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; exon_id "ENSCAFT00000024802%3AE1";
chr3 ucsc exon 72272584 72272708 . + . ID "ENSCAFT00000024802%3AE2"; Parent "ENSCAFT00000024802.14"; Name "ENSCAFT00000024802"; biotype "protein_coding"; gene_id "GENE13"; gene_name "ENSCAFG00000015634"; transcript_id "ENSCAFT00000024802.14"; exon_id "ENSCAFT00000024802%3AE2";
Login before adding your answer.
Traffic: 1372 users visited in the last hour
Are you specifically looking for
canFam2
? Likely because newer versions available now: https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=9612yes I'm looking for this canFam2 only that for some specific cases I have to use, even though I have the newer version also