TransDecoder ORF score to filter "noise" from de novo transcriptome assembly
0
0
Entering edit mode
26 days ago
tdamiani • 0

Hello everyone,

I’m working with short-read RNA-Seq data from plants. Sequencing was performed on the BGI platform DNBSEQ-T7 (paired-end, insert size 150, 6G/sample). We sequenced mRNA from 13 species, 3 tissues/species, 3 replicate/tissue, so a looot of data:) The goal of the study is to find biosynthetic genes responsible for the biosynthesis of secondary metabolites.

Transcriptomes were assembled de novo (using rnaSPAdes) for each species separately. ORF were predicted using TransDecoder:

TransDecoder.LongOrfs -t input --output_dir transdecoder
TransDecoder.Predict -t input --output_dir transdecoder --retain_pfam_hits {input.pfam} --retain_blastp_hits {input.blastp}

The final dataset contain over 1’600’000 potential ORF (see summary below): cds overview

I’m wondering whether I can use the ORF score from TransDecoder to filter out some noise from the data (e.g., assembly artifacts). I know filtering is always risky, but also flagging low-score ORFs as "unrealiable" would be a starting point. I tried to plot the distribution of ORF score from my entire dataset: ORF score distribution

The max count corresponds to score = 12.2, but I also have over 100'000 ORFs with score below 1. Would you consider those artifacts and/or unreliable ORFs? Or you never know?

Thanks in advance for any answer!

transdecoder assembly transcriptome • 132 views
ADD COMMENT

Login before adding your answer.

Traffic: 2024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6