Difference in Contig Lengths from Transcriptome and Whole Genome Assembly
3
1
Entering edit mode
2.4 years ago
b8177 ▴ 10

Why are there, in general, shorter contigs from transcriptome assembly than from a whole genome assembly? I know the difference between transcriptome and genome, but don't really understand what contigs are in the context of sequencing in bioinformatics, and why are the contigs shorter in transcriptome assembly than in whole genome assembly. Anyone mind explaining? Much appreciated.

transcriptome-assembly sequencing genome-assembly • 1.6k views
ADD COMMENT
2
Entering edit mode
2.4 years ago
ATpoint 82k

Contig

The transcriptome is the transcribed part of the genome, hence it is a subset of the genome and as such transcriptomic contigs must be shorter than the genome. The longest possible contig in transcriptomic space is the longest existing transcript. The longest possible contig in a genome assembly context is the chromosome itself. As said above, transcripts are transcribed from the DNA template (the chromosomes) and as such must be subsets of the genome contigs. Hence, transcriptomic contigs < genome contigs.

ADD COMMENT
2
Entering edit mode
2.4 years ago

I think the confusion comes from the word/term 'contig' .

You should consider 'contig' as the result of an assembly process (either DNA or RNA) which represents an ungapped stretch of DNA/RNA you assembled from your input reads.

In that sense a contig in transcript/RNA context is in theorie like 'mRNA' (== the spliced form of a transcribed region from the genome) , in DNA context it represents a piece of genome.

hence, if you would assemble a full chromosome into a single stretch (contigs) it can be millions of nucleotides. Obviously for an RNA contig it can only be the maximum length of an mRNA in your cell/species.

bottom line, we use the same term 'contig' in both context but they biologically represent something completely different and the only basis for this dual usage is that it in essence it is the product of a similar bioinformatic analysis.

ADD COMMENT
2
Entering edit mode
2.4 years ago
Michael 54k

In a transcriptome assembly, contigs represent - often full-length - transcripts (mRNA and ncRNA). In a genome assembly, contigs represent fragments of chromosomes or replicons. Chromosomes are much longer than transcripts. The length of the original sequence limits the sizes of contigs; therefore, the maximum length and N50, L50 values can be much larger for genomic contigs than for transcriptome assemblies.

ADD COMMENT

Login before adding your answer.

Traffic: 2811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6