Question: How to add tss_id and p_id in an Ensembl GTF (or any GTF other than that generated by cufflinks)
1
gravatar for komal.rathi
4.6 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hi everyone,

I am working with mm10 data and using the GRCm38 build 75 GTF from Ensembl. As everyone knows you need the tss_id and p_id to be present for differential isoform expression (by cuffdiff) when using any GTF other than the cufflinks' merged.gtf. I am using the following command to add the tss_id and p_id to my ensembl gtf:

cuffcompare -o cuffcmp -C -G  -r Mus_musculus.GRCm38.75.protein_linc.gtf -s mm10.fa Mus_musculus.GRCm38.75.protein_linc.gtf

To check whether I was doing it correctly, I checked the entries for a particular gene in both the input and output gtfs.

The 'gene' entry for Xkr4 in the original GTF looks like this:

chr1    protein_coding    gene    3205901    3671498    .    -    .    gene_id "ENSMUSG00000051951"; gene_name "Xkr4"; gene_source "ensembl_havana"; gene_biotype "protein_coding";

And, these are the entries corresponding to the above coordinates in the output GTF cuffcmp.combined.gtf:

chr1    processed_transcript    exon    3205901 3207317 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002060"; exon_number "1"; gene_name "Xkr4"; oId "ENSMUST00000162897"; nearest_ref "ENSMUST00000162897"; class_code "="; tss_id "TSS1356";
chr1    processed_transcript    exon    3213609 3216344 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002060"; exon_number "2"; gene_name "Xkr4"; oId "ENSMUST00000162897"; nearest_ref "ENSMUST00000162897"; class_code "="; tss_id "TSS1356";
chr1    processed_transcript    exon    3206523 3207317 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002061"; exon_number "1"; gene_name "Xkr4"; oId "ENSMUST00000159265"; nearest_ref "ENSMUST00000159265"; class_code "="; tss_id "TSS1357";
chr1    processed_transcript    exon    3213439 3215632 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002061"; exon_number "2"; gene_name "Xkr4"; oId "ENSMUST00000159265"; nearest_ref "ENSMUST00000159265"; class_code "="; tss_id "TSS1357";
chr1    protein_coding  exon    3214482 3216968 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002062"; exon_number "1"; gene_name "Xkr4"; oId "ENSMUST00000070533"; nearest_ref "ENSMUST00000070533"; class_code "="; tss_id "TSS1358"; p_id "P1235";
chr1    protein_coding  exon    3421702 3421901 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002062"; exon_number "2"; gene_name "Xkr4"; oId "ENSMUST00000070533"; nearest_ref "ENSMUST00000070533"; class_code "="; tss_id "TSS1358"; p_id "P1235";
chr1    protein_coding  exon    3670552 3671498 .       -       .       gene_id "XLOC_000653"; transcript_id "TCONS_00002062"; exon_number "3"; gene_name "Xkr4"; oId "ENSMUST00000070533"; nearest_ref "ENSMUST00000070533"; class_code "="; tss_id "TSS1358"; p_id "P1235";

In the output, the gene_id field has XLOC ids instead of Ensembl IDs. Can I fix this to have Ensembl IDs instead? Is there a better way to add tss_id and p_id to your Ensembl GTF?

 

cuffcompare • 6.0k views
ADD COMMENTlink modified 3.4 years ago by Malcolm.Cook1.0k • written 4.6 years ago by komal.rathi3.4k
2
gravatar for Malcolm.Cook
3.4 years ago by
Malcolm.Cook1.0k
kansas, usa
Malcolm.Cook1.0k wrote:

I have developed an Rscript, cuffdiff_gtf_attributes, which can provided the additional attributes p_id and tss_id as required by cuffdiff to perform all the differential splicing/coding/expression contrasts.  I have tested it with Ensembl GTF.

 

ADD COMMENTlink written 3.4 years ago by Malcolm.Cook1.0k

Thanks for the script, works great!

ADD REPLYlink written 3.1 years ago by Sukhdeep Singh9.6k
0
gravatar for lavinia.gordon
3.7 years ago by
Australia
lavinia.gordon0 wrote:

Hi,

You can download the correct gtf file from here I believe:

https://ccb.jhu.edu/software/tophat/igenomes.shtml

ADD COMMENTlink written 3.7 years ago by lavinia.gordon0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1220 users visited in the last hour