Transcriptome analysis of two closely related plant species
1
0
Entering edit mode
2.7 years ago
aces • 0

Hi,

I am sorry for a basic question; I am a wet lab biologist trying to perform RNA-Seq with only limited experience. I hope this kind community can help me with some guidance.

I have two plant species that I want to look at their response to an insect.There is a reference genome (with an annotation) for one of the species. I mapped reads of the second species to the genome of the first one using HISAT2. I found ca. 65% map uniquely to the genome with only 2% mapped >1 location (76.76% overall alignment rate).

I would like to generate a master set of transcripts for differential expression analysis of both species. With the decent mapping rate, I am confused whether it is OK to perform de novo assembly of the unmapped reads and annexing them as transcripts specific for the second species? Alternatively, I can try with genome-guided assembly of the second species using its raw reads and later find ortholog groups with the first species. If I go this route, I am just afraid that some of the annotate genes from the first species may be clustered into the same homolog groups and will interfere with differential expression analysis.

rna-seq • 929 views
0
Entering edit mode

Thanks a lot for the advice! I will try looking at TIN as you suggested. :)

Just in case TIN doesn't look good, would you recommend de novo assembly and orthofinder?

0
Entering edit mode

If TIN numbers don't look good you should perform a gene level analysis. If the transcripts are not covered evenly the data won't be able to reliably distinguish between different transcripts.

1
Entering edit mode
2.7 years ago

I would look into the concept called transcript integrity (TIN) to evaluate how well do the reads conform to the transcripts across both species, if the TIN numbers look good and most transcripts are covered you could get meaningful results without assembling anything.

You can compute TIN with the tool called tin.py of the rseqc package

http://rseqc.sourceforge.net/

More on transcript integrity here:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0922-z

Assembling the unmapped reads does not sound like a good idea, if you wish to assemble transcripts I would recommend using all the data then figure out which transcripts are novel rather than taking the unmapped reads alone.

0
Entering edit mode

How to speed up tin.py? It is horrendously slow. Should I split my BAM file into smaller chunks based on chromosomes and perform multiprocessing?

1
Entering edit mode

yes, you can split a bam file by chromosome, and spawn a separate process with gnu parallel