Question: Infer experiment - Failed to determine strandin paired end stranded RNA-seq
0
gravatar for GiusiG
12 weeks ago by
GiusiG0
GiusiG0 wrote:

Hi,

I'm having problems with a stranded, paired end RNA-seq analysis. I tried to map the reads with HISAT2 and RNA STAR but when I visualize the alignment it seems unstranded. If it helps, here the parameter I used with HISAT2:

  Tool Parameters
Input Parameter Value   Note for rerun
Source for the reference genome indexed 
Select a reference genome   hg38    
Is this a single or paired library  paired  
FASTA/Q file #1 80: Rep1_1.fq   
FASTA/Q file #2 81: Rep1_2.fq   
Specify strand information  Reverse (RF)    
Paired-end options  defaults    
sum 
Output alignment summary in a more machine-friendly style.  False   
Print alignment summary to a file.  False   
adv 
Input options   defaults    
Alignment options   defaults    
Scoring options defaults    
Spliced alignment options   defaults    
Reporting options   defaults    
Output options  defaults    
Other options   defaults    
Job Resource Parameters no

In order to understand more what is going on I tried to run "infer experiment' on the bam files and it gives this kind of result...

  This is PairEnd Data
Fraction of reads failed to determine: 0.9590
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0045
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0365

Does anyone have an idea why this is happening?

Thank you very much

ADD COMMENTlink modified 9 weeks ago by michael.ante3.0k • written 12 weeks ago by GiusiG0

Hi, I recommend to run a proper QC analysis including the output of FastQC, STAR or Hisat, and samtools flagstat, mapping rates, all parameters etc. Also, I would recommend to simply get counts from FeatureCounts with all settings for -s [0|1|2] and report. I don't know about about "infer experiment', but the output looks suspicious. In principle, even if the protocol was unstranded, wouldn't one expect that around 50% of pairs could be explained by either model just by chance? So I think something else went wrong:

  • you used the wrong genome annotation version with respect to hg38 count (likely)
  • reads are not properly paired
  • most of reads didn't align
  • the BAM format is incompatible
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Michael Dondrup45k

Salmon also can automatically detect strandedness, you can try it.

ADD REPLYlink written 9 weeks ago by grant.hovhannisyan1.4k
1
gravatar for michael.ante
9 weeks ago by
michael.ante3.0k
Austria/Vienna
michael.ante3.0k wrote:

Hi GusiG,

The RSeQC too tells you that 96% of your reads cannot be used for inferring the strandedness. This could either be due to the case that 96% of your reads are derived from overlapping gene regions, or due to discrepancies in your annotation.

To check the first idea, just have a look at highly expressed genes in IGV.

The second cause can be checked by comparing the samtools idxstats result with the contig/chromosomes in the bed file you used (something like cut -f 1 my_bed_for_rseqc.bed | sort -u ).

I'd guess that the naming of your alignment and the bed-file are not 100% identical, just for the mitochondria its matching. For in instance, some aligners keep the full fasta header as name for the contig.

Cheers,

Michael

ADD COMMENTlink written 9 weeks ago by michael.ante3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1334 users visited in the last hour