Question: de novo transcriptome assembly with >400K genes. How to proceed?
0
gravatar for User 4014
4 months ago by
User 401440
Sweden
User 401440 wrote:

Hi folks,

I have a de novo transcriptome assembly of a polyploid tree species assembled with K=31 and min length 200 bp. The assembly contains almost 400K genes, and after a reduction with CD-HIT-EST (cut-off=0.97), I have around 350K genes left. Mapping ca. 1/4 of total reads back to the assembly showed the majority of the reads align > 1 times. Do you think would it pose a problem if I aim to work at a gene level? I can try cd-hit-est with cut-off=0.95. Or is it better to use Lace to stitch different isoforms together and take it from there?

Thank you very much in advance for your suggestions and comments!

$ bowtie2 --local --no-unal -x cdhit_e97_Trinity_Famer_K31 -p 24 -q -1 cat_70x_R1.fq.gz -2 cat_70x_R2.fq.gz | samtools view -b | samtools sort -o 70x_bowtie2.bam
78850917 reads; of these:
  78850917 (100.00%) were paired; of these:
    2584872 (3.28%) aligned concordantly 0 times
    11430984 (14.50%) aligned concordantly exactly 1 time
    64835061 (82.22%) aligned concordantly >1 times
    ----
    2584872 pairs aligned concordantly 0 times; of these:
      201798 (7.81%) aligned discordantly 1 time
    ----
    2383074 pairs aligned 0 times concordantly or discordantly; of these:
      4766148 mates make up the pairs; of these:
        627598 (13.17%) aligned 0 times
        410463 (8.61%) aligned exactly 1 time
        3728087 (78.22%) aligned >1 times
99.60% overall alignment rate
[bam_sort_core] merging from 80 files and 1 in-memory blocks...
rna-seq • 139 views
ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 4 months ago by User 401440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1492 users visited in the last hour