I have an RNA-seq dataset from a non-model organism and need to do de novo assembly. After running Trinity, I checked the quality of the assembled transcriptome by aligning back the reads using both bowtie and bowtie2. The problem is that the "aligned concordantly exactly 1 time" percentage is very low. For instance from bowtie2:
40191191 reads; of these: 40191191 (100.00%) were paired; of these: 1107263 (2.75%) aligned concordantly 0 times 811254 (2.02%) aligned concordantly exactly 1 time 38272674 (95.23%) aligned concordantly >1 times ---- 1107263 pairs aligned concordantly 0 times; of these: 3136 (0.28%) aligned discordantly 1 time ---- 1104127 pairs aligned 0 times concordantly or discordantly; of these: 2208254 mates make up the pairs; of these: 585954 (26.53%) aligned 0 times 74874 (3.39%) aligned exactly 1 time 1547426 (70.07%) aligned >1 times 99.27% overall alignment rate
The same trend was observed when using busco to check the quality of assembly:
I have used BBduk and Trimmomatic to remove adapters and low quality reads. For cleaned reads, fastqc confirmed removal of adapters and per base quality but failed “Sequence Duplication Levels”. I assumed this is very common with RNA-seq data (?).
Now, I wanted to know if someone can suggest what improvement can be done in this case? Can I continue with this assembly? My ultimate plan is to run corset to remove the redundancies, use RSEM to quantify the expression levels and then performing DE analysis. Thanks for the help in advance!