How to get all known human transcript isoforms in fasta format?
1
0
Entering edit mode
6.1 years ago
Venados ▴ 30

Hello all!

I'd like to get a fasta entry per known transcript in the GENCODE GTF v27 (ensembl 90).

Is there a simple way to do this or a repository where I can find this fasta file?

Or do I need to write a script to extract and concatenate all the exon sequences for every transcript?

Thanks a lot in advance!

RNA-Seq rna-seq fasta transcripts • 1.8k views
ADD COMMENT
2
Entering edit mode
6.1 years ago
GenoMax 141k

From Ensembl.

You can also use BioMart for that as well.

ADD COMMENT
0
Entering edit mode

I just downloaded the hg38_CDS_all.fasta which should contain the data I want as it doesn't contain the intronic sequence.

However, there are many entries that are too short to be transcripts, any idea why this happens? Thanks in advance!:

>ENST00000434970.2 cds chromosome:GRCh38:14:22439007:22439015:1 gene:ENSG00000237235.2 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD2 description:T-cell receptor delta diversity 2 [Source:HGNC Symbol;Acc:HGNC:12255]

CCTTCCTAC

>ENST00000448914.1 cds chromosome:GRCh38:14:22449113:22449125:1 gene:ENSG00000228985.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD3 description:T-cell receptor delta diversity 3 [Source:HGNC Symbol;Acc:HGNC:12256]

ACTGGGGGATACG

>ENST00000415118.1 cds chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD1 description:T-cell receptor delta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12254]

GAAATAGT

>ENST00000631435.1 cds chromosome:GRCh38:CHR_HSCHR7_2_CTG6:142847306:142847317:1 gene:ENSG00000282253.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRBD1

GGGACAGGGGGC

>ENST00000632684.1 cds chromosome:GRCh38:7:142786213:142786224:1 gene:ENSG00000282431.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRBD1

GGGACAGGGGGC

>ENST00000454908.1 cds chromosome:GRCh38:14:105919502:105919518:-1 gene:ENSG00000236170.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD1-1 description:immunoglobulin heavy diversity 1-1 [Source:HGNC Symbol;Acc:HGNC:5482]

GGTACAACTGGAACGAC

>ENST00000390567.1 cds chromosome:GRCh38:14:105881034:105881053:-1 gene:ENSG00000211907.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD1-26 description:immunoglobulin heavy diversity 1-26 [Source:HGNC Symbol;Acc:HGNC:5485]

GGTATAGTGGGAGCTACTAC

>ENST00000603326.1 cds chromosome:GRCh38:15:20004797:20004815:-1 gene:ENSG00000271317.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD4OR15-4A description:immunoglobulin heavy diversity 4/OR15-4A (non-functional) [Source:HGNC Symbol;Acc:HGNC:5506]

TGACTATGGTGCTAACTAC

>ENST00000414852.1 cds chromosome:GRCh38:14:105913222:105913237:-1 gene:ENSG00000233655.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD4-4 description:immunoglobulin heavy diversity 4-4 [Source:HGNC Symbol;Acc:HGNC:5505]

TGACTACAGTAACTAC

>ENST00000454691.1 cds chromosome:GRCh38:14:105910410:105910427:-1 gene:ENSG00000228131.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD6-6 description:immunoglobulin heavy diversity 6-6 [Source:HGNC Symbol;Acc:HGNC:5517]

GAGTATAGCAGCTCGTCC

ADD REPLY

Login before adding your answer.

Traffic: 1715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6