Is the union of cDNA sequences the exome?
2
1
Entering edit mode
3.1 years ago
Bioaln ▴ 350

Hello all, I've recently started with DNA-related analysis, and was wondering, whether, if I take:

does this represent, for example, the human exome? If not, what are the differences, and how can one obtain the missing information then?

Thank you!

DNA-seq cDNA exome • 830 views
2
Entering edit mode
3.1 years ago

the cDNA file will contain all mRNAs of the human genome. It will be CDS + UTR (if available) and thus represents the transcribed part of the genome that will eventually be translated into proteins.

0
Entering edit mode

To be more precise cDNA consists of all transcribed RNAs so mRNA as you say but also ncRNAs, pseudogenes, rRNAs, etc..

0
Entering edit mode

was thinking that as well but since they also offer a ncRNA fasta file I would assume they focus on the protein coding in the cDNA one but indeed possible it contains all transcribed things.

0
Entering edit mode

cDNA = cDNA sequences for Ensembl or ab initio predicted genes.

is what's written on their site but does not give much additional info

0
Entering edit mode

So, technically this is the exome, i.e., the set of all (known) exons?

0
Entering edit mode

I would say yes indeed.

Depends however how you define 'exons' , it might be that it is mainly/only the ones being part of an mRNA and thus not includes the non-translated ones (not sure if you're interested in those as well)

0
Entering edit mode

Currently I am not, so this seems to hold! Thanks.

0
Entering edit mode
3.1 years ago
Benn 8.2k

It is not exactly clear to me why and what you want to do with it, but if you look at your same link https://www.ensembl.org/info/data/ftp/index.html in column "gene sets" you will find GTF and GFF3 annotation files with all exons (in coordinates).

Just to show the difference between exons, mRNA, and CDS here the info from such annotation file of mouse genome. Let's have a look at the gene ENSMUST00000130201:

grep "ENSMUST00000130201" ensGene.gff3
chr1    ensGene mRNA    4773206 4785710 .   -   .   Name=ENSMUST00000130201;Parent=ENSMUSG00000033845;ID=ENSMUST00000130201;Alias=ENSMUSG00000033845
chr1    ensGene exon    4773206 4774516 .   -   .   Name=ENSMUST00000130201.exon4;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon4
chr1    ensGene exon    4777525 4777648 .   -   .   Name=ENSMUST00000130201.exon3;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon3
chr1    ensGene exon    4782568 4782733 .   -   .   Name=ENSMUST00000130201.exon2;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon2
chr1    ensGene exon    4783951 4784105 .   -   .   Name=ENSMUST00000130201.exon1;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon1
chr1    ensGene exon    4785573 4785710 .   -   .   Name=ENSMUST00000130201.exon0;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.exon0
chr1    ensGene three_prime_UTR 4773206 4774451 .   -   .   Name=ENSMUST00000130201.utr4;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.utr4
chr1    ensGene five_prime_UTR  4785678 4785710 .   -   .   Name=ENSMUST00000130201.utr0;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.utr0
chr1    ensGene CDS 4785573 4785677 .   -   0   Name=ENSMUST00000130201.cds0;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds0
chr1    ensGene CDS 4783951 4784105 .   -   0   Name=ENSMUST00000130201.cds1;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds1
chr1    ensGene CDS 4782568 4782733 .   -   1   Name=ENSMUST00000130201.cds2;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds2
chr1    ensGene CDS 4777525 4777648 .   -   0   Name=ENSMUST00000130201.cds3;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds3
chr1    ensGene CDS 4774452 4774516 .   -   2   Name=ENSMUST00000130201.cds4;Parent=ENSMUST00000130201;ID=ENSMUST00000130201.cds4


You'll see that the exons overlap the complete mRNA region, but not CDS.

0
Entering edit mode

Thank you for this information. Indeed, this helps me identify the exons.