Question regarding stringtie algorithm and assembly strategy
1
0
Entering edit mode
3.2 years ago

Good morning Biostars. I realize this might be a very specific question, and I have also tried to post it on the corresponding GitHub repo, but I'm afraid that might take a while.

For my project I've have been tasked to assemble a transcriptome for my species of interests, using as a guide the genome (not at a chromosome level, 2090 contigs) and using 23 Illumina samples.

After my initial assembly (hisat2 + stringtie2 + stringtie2 --merge pipeline) I've noticed that quite a few transcripts In my assembly covered two or more reference genes. After manual inspection, I've noticed that a great deal of these cases actually has only a few reads supporting the splicing sites. This is a known problem with stringtie. To solve this, I decided to increase the stringency (with -c 1.5 and -j 15), which lead to some improvements. My supervisor suggested that I instead concatenate all the alignments from all the 23 samples, and then feed that file to stringtie, increasing the -j parameter. I've since read the original paper on stringtie and got the idea that transcript expression levels are important for the assembly, but I'm not to sure not this.

Is it correct to use this approach, or I should assemble each sample individually?

Assembly RNA-Seq • 737 views
ADD COMMENT
0
Entering edit mode
3.2 years ago

One would normally assemble each sample seperately and then use stringtie merge to merge those together, only keeping those that have enough support, but I'm not sure anyone has ever done a proper systematic comparison of the two approaches.

ADD COMMENT
0
Entering edit mode

I searched the literature and I have not found a single example of my approach. My logic is that by using all the reads as evidence I have better control of the -j parameter to decrease trans-splicing sites (splicing sites between different reference genes).

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6