Trinity assembly result check
2
0
Entering edit mode
8.3 years ago
iamtuttu5 ▴ 40

Hello all

I performed trinity assembly using two fastq files of 3 GB each and obtained result as Trinity.fasta

But result is just 1269 KB in size

Why is it that small?

Whether there is any problem with assembly?

Any help will be appreciated

Thank you

RNA-Seq next-gen Assembly • 2.7k views
ADD COMMENT
0
Entering edit mode

What organism? Did you perform quality checks on your fastq? Did you examine Trinity logs?

ADD REPLY
0
Entering edit mode

plant transcriptome data ...

trinity log means?

ADD REPLY
0
Entering edit mode

this is the basic statistics of the obtained result

################################
## Counts of transcripts, etc.
################################
Total trinity 'genes':  2030
Total trinity transcripts:      2723
Percent GC: 49.94

########################################
Stats based on ALL transcript contigs:
########################################

        Contig N10: 712
        Contig N20: 504
        Contig N30: 405
        Contig N40: 342
        Contig N50: 303

        Median contig length: 264
        Average contig: 318.96
        Total assembled bases: 868526


#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################

        Contig N10: 754
        Contig N20: 489
        Contig N30: 377
        Contig N40: 323
        Contig N50: 289

        Median contig length: 257
        Average contig: 310.82
        Total assembled bases: 630964
ADD REPLY
1
Entering edit mode
8.3 years ago
rtliu ★ 2.2k

Supposed you have done proper QC of fastq files (e.g. removing sequencing adaptors), try normalize reads and assemble the transcript with normalized reads (default 50x coverage):

Trinity --normalize_reads --seqType fq ...

Trinity Insilico Normalization

ADD COMMENT
0
Entering edit mode
3.3 years ago
juanjo75es ▴ 130

I guess anyone in that situation has two options:

  • Not using that dataset, You can try QC or you can repeat the sequencing using better tools. I recommend mapping the reads to a reference genome and visualizing the result with a tool that doesn't hide indels. You can use minimap2 for the mapping and "samtools tview" for visualizing. Alternatively, you can use Contignant s-aligner for the mapping and AliView for visualization. This way you will know exactly what the problem is with your data.

  • Not using Trinity. You can use better assemblers like SPAdes or Contignant s-aligner. It's likely that your results will improve exponentially.

Disclosure: I am a developer in Contignat s-aligner.

ADD COMMENT
0
Entering edit mode

Bigger disclosure: The s-aligner tool is not free, and the code on github seems incomplete, so it's not really open source either. People're better off sticking to known devils.

ADD REPLY
0
Entering edit mode

Nobody claimed it was open source... As with every software mentioned in the answer, it requires funding. The only difference is that it didn't get funds before completion. If you don't like it don't use it. No need to make false accusations.

ADD REPLY

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6