Question

Obtaining contig stats from Velvet+Oases De Novo Assembly

1

Entering edit mode

9.5 years ago

blazer9131 ▴ 20

Hey all,

Here's a laydown of my information.

We sequenced a transcriptome. The company returned to us 4 different reads of that same transcriptome.

We ran FastQC and trimmed all 4 paired reads individually. We had some contamination issues, so we ran DeconSeq on all 4 reads.

We ran through the velvet+oases pipeline using Kmergenie for our k-mer count.

We concatenated the final oases outputs into one fasta file. This contained 4 repeats of our transcripts.

We ran through the Velvet+Oases + kmergenie pipeline again, this time utilizing the "merged" option.

We took the single output from Oases and ran it against CD-HITs to merge the similar sequences together.

Now, we have a final fasta file containing around 37,000 transcripts.

How can I obtain the contig stats for this final file? I've read countless papers which outline the following core-information:

Contig Number
Maximum Contig Length
Minimum Contig length
Average Contig Length
N50 Length
Number of Reads per contig

I looked at the "stats.txt" file from Oases, but nothing is given in this format.

How would I go about generating that info?

Thank you.

Assembly Velvet Oases DeNovo • 3.1k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by blazer9131 ▴ 20

Ram · Answer 1 · 2014-10-21

0

Entering edit mode

9.5 years ago

Istvan Albert 100k

These concepts that you list above like N50 only make sense when one assembles a genome. They are not meaningful for evaluating the quality of transcript assemblies.

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by Istvan Albert 100k

0

Entering edit mode

Dear all,

I ran the ./oases and generated Transcript.fa. May I know the statistics of Transcripts.fa as we got in Trinity. If yes then guide me how it will calculated.

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by Nai ▴ 50