Question

Tool:GUESSmyLT guess the library type of your RNA-seq data (orientation, strandness)

1

Entering edit mode

5.1 years ago

Juke34 8.5k

It's common that metadata/information related to RNA-seq data are absent or not really clear. Often it needs substantial work and launching several tools to deduce those information.

Currently several approaches exist:

for the most known infer_experiment.py but you need an annotation ( I'm most of time working on species that do not have any annotation available so I always need to launch draft annotation to have few genes usable)
Launch tophat or hisat twice ( using fr-firststrand first and then fr-secondstrand) and then compare the results as explained here. The results from that are not always clear.
Map your reads (or a subsample) and then look at the results (how Read1 and Read2 are aligned) within a genome browser as explained here.
Use Salmon (but relative result because does not use an annotation)
...

Tired by receiving RNA-seq data without information of the library type used I mature the idea to develop a single tool to automate this task and provide me all the information needed based on any type of input data used as input:

With or without fasta file (it will do an transcriptome assembly in no fasta file provided to map the reads against)
With or without an annotation (Do an annotation using BUSCO if no gff/gtf provided)
...

Here is the result: GUESSmyLT

I hope it could help many of us to resolve this recurrent problem.
You are welcome to try it and provide feedback to improve it.

An example of result:

Results of paired library inferring of reads 4_r1.sub.100000 on ref 4: 

Library type    Reads     Percent     Vizualization according to firststrand

   undecided        1        0.0%     3' -------??------- 5'
                                      5' -------??------- 3'


   ff_second        2        0.0%     3' ----------==2==> 5'
                                      5' ==1==>---------- 3'


    fr_first     4019       47.2%     3' ----------<==1== 5'
                                      5' ==2==>---------- 3'


    ff_first        5        0.1%     3' ----------<==1== 5'
                                      5' <==2==---------- 3'


   rf_second       19        0.2%     3' ----------==2==> 5'
                                      5' <==1==---------- 3'


    rf_first       21        0.2%     3' ----------==1==> 5'
                                      5' <==2==---------- 3'


   fr_second     4454       52.3%     3' ----------<==2== 5'
                                      5' ==1==>---------- 3'

Roughly 50/50 split between the strands of the same library orientation should be interpreted as unstranded.

rna-seq sequencing • 1.6k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 5.1 years ago by Juke34 8.5k

1

Entering edit mode

interesting tool, the descriptions on the possible library types are also quite handy.

some feedback on usage,

the example invocations are needlessly lengthy, you should not need to list the files as home/.../read1.fastq just call the files read1.fq and read2.fq why bother with the absolute paths

the use cases should be labeled by the information that is available to the end user:

if you have reference genome and annotations
if you have reference genome but no annotations
if you have transcript sequences but no genome
if you have no other information just the reads

requiring snakemake to run your tool seems to add unneeded complexity.

in general it seems there seem to be too many dependencies. It feels like the task at hand (determine the library type) ought to be much simpler than having to first assemble a transcript. Not sure what the right answer is here, but this might be an interesting research problem on its own. How to detect the library type without assembling transcripts?

What I am basically saying is that transcript assembly is a different and much bigger/complicated task than library type detection.

ADD REPLY • link 5.1 years ago by Istvan Albert 100k

0

Entering edit mode

Thank you for your feedback. It's true that snakelike seems unnecessary. I will think about it and see if I keep it or not.

ADD REPLY • link 5.1 years ago by Juke34 8.5k