Hello everyone,
I am working on a project aimed at reconstructing the RNAseq transcriptome (59 bp, paired-end) to discover new transcripts in mice. I have six different tissues, and for each tissue, I have two WT replicates and two Mutant replicates. I will briefly describe my transcript assembly method.
1- Transcript reconstruction with StringTie
stringtie -p 8 -c 2 -j 2 "$BAM_FILE" -G $GTF_FILE -o "$OUTPUT_GTF"
2- Merging assembly files with StringTie, with and without using a reference
stringtie --merge -G $GTF_FILE -o $OUT_DIR/assembly_merged.gtf $INPUT_DIR/assembly_Stringtie*.gtf
stringtie --merge -o $OUT_DIR/assembly_merged.gtf $INPUT_DIR/assembly_Stringtie*.gtf
3- Transcript quantification
stringtie -p 12 -e -B $BAM_FILE -G $GTF_merged -o $OUTPUT_GTF
Questions:
- What do you think of this approach? ( Is the method used suitable for discovering new transcripts? Are there any steps or parameters that could be optimized to improve the results?)
- Should I merge the replicates before assembly (Step 1)?
- How should I choose between merging with or without a reference (Step 2)?
59 bp reads ?? auwch :-) , that's ancient data, is that possible?
and on a more constructive note: it's quite hard to get something meaningful with reads of that size I'm afraid (especially in assembly context)
Yes, I have RNA-seq data with 59 bp read length.
So, do you think it’s not possible with this read length?
it is possible (technically speaking) but I fear a bit the results might be disappointing though