Obtaining contig stats from Velvet+Oases De Novo Assembly
Entering edit mode
6.5 years ago
blazer9131 ▴ 20

Hey all,

Here's a laydown of my information.

We sequenced a transcriptome. The company returned to us 4 different reads of that same transcriptome.

We ran FastQC and trimmed all 4 paired reads individually. We had some contamination issues, so we ran DeconSeq on all 4 reads.

We ran through the velvet+oases pipeline using Kmergenie for our k-mer count.

We concatenated the final oases outputs into one fasta file. This contained 4 repeats of our transcripts.

We ran through the Velvet+Oases + kmergenie pipeline again, this time utilizing the "merged" option.

We took the single output from Oases and ran it against CD-HITs to merge the similar sequences together.

Now, we have a final fasta file containing around 37,000 transcripts.

How can I obtain the contig stats for this final file? I've read countless papers which outline the following core-information:

Contig Number
Maximum Contig Length
Minimum Contig length
Average Contig Length
N50 Length
Number of Reads per contig

I looked at the "stats.txt" file from Oases, but nothing is given in this format.

How would I go about generating that info?

Thank you.

Assembly Velvet Oases DeNovo • 2.5k views
Entering edit mode
6.5 years ago

These concepts that you list above like N50 only make sense when one assembles a genome. They are not meaningful for evaluating the quality of transcript assemblies.

Entering edit mode

Dear all,

I ran the ./oases and generated Transcript.fa. May I know the statistics of Transcripts.fa as we got in Trinity. If yes then guide me how it will calculated.


Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6