Question: De novo transcriptome assembly: Low concordant/discordant reads, High overall allignment
gravatar for willthompson131
20 months ago by
willthompson1310 wrote:

Hey everybody,

I'm pretty new to the RNA-seq world but I've got a troubleshooting question I hoped someone might be able to help with this.

I made a de novo assembly in Trinity and wanted to run some QA checks on that assembly using Bowtie 2. When I map my reads used to construct the assembly back to the assembly, I get this.

96120217 reads; of these:
96120217 (100.00%) were paired; of these:
95910642 (99.78%) aligned concordantly 0 times
51816 (0.05%) aligned concordantly exactly 1 time
157759 (0.16%) aligned concordantly >1 times
    95910642 pairs aligned concordantly 0 times; of these:
      8609030 (8.98%) aligned discordantly 1 time
    87301612 pairs aligned 0 times concordantly or discordantly; of these:
      174603224 mates make up the pairs; of these:
        27478994 (15.74%) aligned 0 times
        38739984 (22.19%) aligned exactly 1 time
        108384246 (62.07%) aligned >1 times
85.71% overall alignment rate

Obviously, the 99.78% not aligned concordantly is not what I had expected. Does anyone have some likely explanations?

Thank you so much in advance for the help.

rna-seq • 662 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by willthompson1310

What's the organism and the assembled transcriptome quality (length distribution etc.)?

ADD REPLYlink written 20 months ago by Asaf8.4k

...and what is the read length of the sequencing experiment?

ADD REPLYlink written 20 months ago by ATpoint39k

The organism is a salamander, Ambystoma opacum.

Read length is 2 x 75 bp

Counts of transcripts, etc. Total trinity 'genes': 150490 Total trinity transcripts: 244195 Percent GC: 46.48

Stats based on ALL transcript contigs:

    Contig N10: 7579
    Contig N20: 5446
    Contig N30: 4175
    Contig N40: 3095
    Contig N50: 2141

    Median contig length: 321
    Average contig: 785.91
    Total assembled bases: 191916257

Stats based on ONLY LONGEST ISOFORM per 'GENE':

    Contig N10: 6809
    Contig N20: 4819
    Contig N30: 3513
    Contig N40: 2453
    Contig N50: 1478

    Median contig length: 302
    Average contig: 678.32
    Total assembled bases: 102080495
ADD REPLYlink modified 20 months ago • written 20 months ago by willthompson1310
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1644 users visited in the last hour