Question: How can I get Transcript ID from the gene ID?
0
gravatar for Carlos Caicedo
2.2 years ago by
Colombia/Universidad de Antioquia
Carlos Caicedo130 wrote:

Dear all

I have a list of gene IDs in a tabular format. How I can extract the transcript IDs for the list of genes IDs mentioned above, from a gff file?

Thank you so much.

Carlos

rna-seq • 2.1k views
ADD COMMENTlink modified 2.2 years ago by Jeffin Rockey1.1k • written 2.2 years ago by Carlos Caicedo130

Depends on what genome this is but you could try BioMart tool from Ensembl.

ADD REPLYlink written 2.2 years ago by genomax74k

I have a data from a bacterium specie, so I think BioMart does not function in this case.

ADD REPLYlink written 2.2 years ago by Carlos Caicedo130

If this is a bacterium then you should have a single transcript from each gene since there is no alternate splicing, right?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax74k

Of course, you are right. I going to try to do a better explanation of my question.

A gff file is something like this:

chromosome  ena gene    661 1041    .   -   .   ID=gene:SCLAV_0001;biotype=protein_coding;description=Hypothetical protein;gene_id=SCLAV_0001;logic_name=ena;version=1
chromosome  ena transcript  661 1041    .   -   .   ID=transcript:EFG05077;Parent=gene:SCLAV_0001;biotype=protein_coding;transcript_id=EFG05077;version=1
chromosome  ena exon    661 1041    .   -   .   Parent=transcript:EFG05077;Name=EFG05077-1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EFG05077-1;rank=1;version=1
chromosome  ena CDS 661 1041    .   -   0   ID=CDS:EFG05077;Parent=transcript:EFG05077;protein_id=EFG05077

I have a list with ID:gene

SCLAV_0001
SCLAV_0002

And I need to get for each gene in the list the transcript ID

For instance:

SCLAV_0001   EFG05077
SCLAV_0002  EFG0XXY

And so on.

ADD REPLYlink modified 2.2 years ago by genomax74k • written 2.2 years ago by Carlos Caicedo130
1
gravatar for Jeffin Rockey
2.2 years ago by
Jeffin Rockey1.1k
Karimannoor
Jeffin Rockey1.1k wrote:

Hope the below one liner helps or at least, indicates the way to go ..

awk<yourGeneModel.gff3 -F'\t' '$3=="transcript" {print$9}'  | sed -e 's|ID=transcript:\([^;]*\)\(.*\)Parent=gene:\([^;]*\)\(.*\)|\2\t\1|g'

If it is one gene one transcript for the genemodel, this should do. Else one more script to combine multiple transcripts per gene would be required.

ADD COMMENTlink written 2.2 years ago by Jeffin Rockey1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1838 users visited in the last hour