Question: De novo transcriptome assembly: Low concordant/discordant reads, High overall allignment
0
gravatar for willthompson131
11 weeks ago by
willthompson1310 wrote:

Hey everybody,

I'm pretty new to the RNA-seq world but I've got a troubleshooting question I hoped someone might be able to help with this.

I made a de novo assembly in Trinity and wanted to run some QA checks on that assembly using Bowtie 2. When I map my reads used to construct the assembly back to the assembly, I get this.

96120217 reads; of these:
96120217 (100.00%) were paired; of these:
95910642 (99.78%) aligned concordantly 0 times
51816 (0.05%) aligned concordantly exactly 1 time
157759 (0.16%) aligned concordantly >1 times
    ----
    95910642 pairs aligned concordantly 0 times; of these:
      8609030 (8.98%) aligned discordantly 1 time
    ----
    87301612 pairs aligned 0 times concordantly or discordantly; of these:
      174603224 mates make up the pairs; of these:
        27478994 (15.74%) aligned 0 times
        38739984 (22.19%) aligned exactly 1 time
        108384246 (62.07%) aligned >1 times
85.71% overall alignment rate

Obviously, the 99.78% not aligned concordantly is not what I had expected. Does anyone have some likely explanations?

Thank you so much in advance for the help.

rna-seq • 201 views
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by willthompson1310

What's the organism and the assembled transcriptome quality (length distribution etc.)?

ADD REPLYlink written 11 weeks ago by Asaf5.5k

...and what is the read length of the sequencing experiment?

ADD REPLYlink written 11 weeks ago by ATpoint15k

The organism is a salamander, Ambystoma opacum.

Read length is 2 x 75 bp

Counts of transcripts, etc. Total trinity 'genes': 150490 Total trinity transcripts: 244195 Percent GC: 46.48

Stats based on ALL transcript contigs:

    Contig N10: 7579
    Contig N20: 5446
    Contig N30: 4175
    Contig N40: 3095
    Contig N50: 2141

    Median contig length: 321
    Average contig: 785.91
    Total assembled bases: 191916257

Stats based on ONLY LONGEST ISOFORM per 'GENE':

    Contig N10: 6809
    Contig N20: 4819
    Contig N30: 3513
    Contig N40: 2453
    Contig N50: 1478

    Median contig length: 302
    Average contig: 678.32
    Total assembled bases: 102080495
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by willthompson1310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1501 users visited in the last hour