Question: Obtaining contig stats from Velvet+Oases De Novo Assembly
5.9 years ago
United States
wrote:

Hey all,

Here's a laydown of my information.

We sequenced a transcriptome. The company returned to us 4 different reads of that same transcriptome.

We ran FastQC and trimmed all 4 paired reads individually. We had some contamination issues, so we ran DeconSeq on all 4 reads.

We ran through the velvet+oases pipeline using Kmergenie for our k-mer count.

We concatenated the final oases outputs into one fasta file. This contained 4 repeats of our transcripts. 

We ran through the Velvet+Oases + kmergenie pipeline again, this time utilizing the "merged" option.

We took the single output from Oases and ran it against CD-HITs to merge the similar sequences together.

Now, we have a final fasta file containing around 37,000 transcripts.

How can I obtain the contig stats for this final file? I've read countless papers which outline the following core-information: 

Contig Number
Maximum Contig Length
Minimum Contig length
Average Contig Length
N50 Length
Number of Reads per contig

I looked at the "stats.txt" file from Oases, but nothing is given in this format.

How would I go about generating that info?

Thank you.

velvet oases assembly de novo
modified 5.9 years ago by Istvan Albert ♦♦ 85k • written 5.9 years ago by blazer913110
5.9 years ago
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

These concepts that you list above like N50 only make sense when one assembles a genome.  They are not meaningful for evaluating the quality of transcript assemblies.

modified 5.9 years ago • written 5.9 years ago by Istvan Albert ♦♦ 85k

Dear all,

I ran the ./oases and generated Transcript.fa. May i know the statistics of Transcripts.fa as we got in Trinity . it yes then guide me how it will calculatated.

written 5.9 years ago by nehagoel24march0
Please log in to add an answer.


