Question

Infer experiment - Failed to determine strandin paired end stranded RNA-seq

0

Entering edit mode

5.4 years ago

GiusiG ▴ 10

Hi,

I'm having problems with a stranded, paired end RNA-seq analysis. I tried to map the reads with HISAT2 and RNA STAR but when I visualize the alignment it seems unstranded. If it helps, here the parameter I used with HISAT2:

  Tool Parameters
Input Parameter Value   Note for rerun
Source for the reference genome indexed 
Select a reference genome   hg38    
Is this a single or paired library  paired  
FASTA/Q file #1 80: Rep1_1.fq   
FASTA/Q file #2 81: Rep1_2.fq   
Specify strand information  Reverse (RF)    
Paired-end options  defaults    
sum 
Output alignment summary in a more machine-friendly style.  False   
Print alignment summary to a file.  False   
adv 
Input options   defaults    
Alignment options   defaults    
Scoring options defaults    
Spliced alignment options   defaults    
Reporting options   defaults    
Output options  defaults    
Other options   defaults    
Job Resource Parameters no

In order to understand more what is going on I tried to run "infer experiment' on the bam files and it gives this kind of result...

  This is PairEnd Data
Fraction of reads failed to determine: 0.9590
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0045
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0365

Does anyone have an idea why this is happening?

Thank you very much

RNA-Seq HISAT2 STAR paired end stranded • 3.0k views

ADD COMMENT • link updated 5.0 years ago by samuel ▴ 240 • written 5.4 years ago by GiusiG ▴ 10

1

Entering edit mode

Hi, I recommend to run a proper QC analysis including the output of FastQC, STAR or Hisat, and samtools flagstat, mapping rates, all parameters etc. Also, I would recommend to simply get counts from FeatureCounts with all settings for -s [0|1|2] and report. I don't know about about "infer experiment', but the output looks suspicious. In principle, even if the protocol was unstranded, wouldn't one expect that around 50% of pairs could be explained by either model just by chance? So I think something else went wrong:

you used the wrong genome annotation version with respect to hg38 count (likely)
reads are not properly paired
most of reads didn't align
the BAM format is incompatible

ADD REPLY • link 5.4 years ago by Michael 54k

1

Entering edit mode

Salmon also can automatically detect strandedness, you can try it.

ADD REPLY • link 5.3 years ago by grant.hovhannisyan ★ 2.6k

score 1 · Answer 1 · 2018-12-17

Hi GusiG,

The RSeQC too tells you that 96% of your reads cannot be used for inferring the strandedness. This could either be due to the case that 96% of your reads are derived from overlapping gene regions, or due to discrepancies in your annotation.

To check the first idea, just have a look at highly expressed genes in IGV.

The second cause can be checked by comparing the samtools idxstats result with the contig/chromosomes in the bed file you used (something like cut -f 1 my_bed_for_rseqc.bed | sort -u ).

I'd guess that the naming of your alignment and the bed-file are not 100% identical, just for the mitochondria its matching. For in instance, some aligners keep the full fasta header as name for the contig.

Cheers,

Michael