Another infer_experiment.py interpretation question
1
0
Entering edit mode
11 months ago

Hello there,

So I was given a bacterial RNA-Seq experiment to analyze and after generating alignments I wanted to know if the reads were stranded or not. So I ran infer_experiment.py on each BAM file which returned the following results:

This is PairEnd Data
Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.1704
Fraction of reads explained by "1+-,1-+,2++,2--": 0.8296

This is PairEnd Data
Fraction of reads failed to determine: 0.0030
Fraction of reads explained by "1++,1--,2+-,2-+": 0.3413
Fraction of reads explained by "1+-,1-+,2++,2--": 0.6556
Unknown Data type

This is PairEnd Data
Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.2744
Fraction of reads explained by "1+-,1-+,2++,2--": 0.7256

This is PairEnd Data
Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.2090
Fraction of reads explained by "1+-,1-+,2++,2--": 0.7910
Unknown Data type

This is PairEnd Data
Fraction of reads failed to determine: 0.0015
Fraction of reads explained by "1++,1--,2+-,2-+": 0.2698
Fraction of reads explained by "1+-,1-+,2++,2--": 0.7288
Unknown Data type

This is PairEnd Data
Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.2101
Fraction of reads explained by "1+-,1-+,2++,2--": 0.7899


Clearly, this does not answer my question. I've read the many posts on here and elsewhere about interpreting infer_experiment results, but this appears to be a unique case. Do you think aberrant antisense transcription played a part?

My main questions are, (1) what might have caused this, and (2) is the data usable for downstream analysis (differential expression)?

Many thanks for your help, it is much appreciated.

Lyn

UPDATE

It's supposed to be STRANDED. So apparently that's not completely the case. Any advice before moving forward?

RNA-Seq sequencing • 650 views
0
Entering edit mode

Are you not able to find out which kit was used to generate these libraries? Have you visually examined the alignments in a genome browser? That can give you an idea as well.

0
Entering edit mode

Thanks GenoMax. Unfortunately I don't have any information about the kit that was used (but I'm working on it). I've just looked at the alignments in IGV and it appears to be an unstranded library:

https://imgur.com/2lLcNqN

But does the infer_experiment output indicate a bias towards fr-firststrand - could this be a problem when counting features? Or should I just treat it as unstranded and move on?

Thanks!

0
Entering edit mode

Are you sure you are using the correct reference? Double check that. Perhaps the strain you have has genomic rearrangements (and you may be using standard reference from NCBI?) Data does look stranded was opinion of a fellow moderator.

0
Entering edit mode

Thanks again. I've tried now with an NCBI, Ensembl, and custom reference and infer_experiment gives the same results. I've actually got PacBio data from the same samples used for RNA-Seq and de novo assembly of each gives a single contig that is identical to the reference. So at this point do you think that it could be a poor annotation for this organism (GTF from both NCBI and Ensembl) or a failed library prep? And is this going to pose a problem? Thaaanks! :-)

0
Entering edit mode

Poor annotation might be the reason, Do you know from where the annotation is coming?

0
Entering edit mode
8 months ago
dare_devil ★ 1.5k
Ironically the output of the tool supposed to help you decide only adds to the confusion.


Here is a better way to do it, that does not require running tools, instead asks you to understand your own data.
Go to the documentation for GUESSmyLT you don't have to run the tool, instead study and understand what the possible orientations are as described in the docs:
https://github.com/NBISweden/GUESSmyLT
Now open up your BAM and transcript GTG files in IGV and select "view as pairs" options (right-click the panel). Now all you need is to look at your data relative to a gene and visually evaluate which case do you have. The patterns are usually absolutely evident at first glance.

courtesy

0
Entering edit mode

When you quote someome verbatim, you need to make that clear by using quotation marks or a blockquote liks so:

this text is quoted

(see the quote icon above when you are in edit mode)

In addition you need to provide a reference to the original source. What you wrote is a copy-paste from:

Answer: Infering strand specificity of bam files using rseqc?