Hi,
I use HISAT2 + Cuffdiff to process my 150PE Mouse RNA-seq data.
Recently, I notice there are huge differences in number between compatible_count and total_count in my result. Many genes were underestimated due to zero "compatible count".
I've check the "XS:A:(+-) and it exists in my SAM/BAM files (Below). I visually check alignments with IGV and nothing is strange.
I also have tried different CuffDiff parameters, like –total-hits-norm or --poisson-dispersion, to see any improvements. But parameters didn't work. The only progress is that correct number of total counts was recognized by CuffDiff (Below)
My questions are:
- What features are taken to consider a read-pair compatible or not by CuffDiff ?
- Any parameters to increase number of compatible_count?
Thank you very much for your help.
SAM example:
A00123:18:H3MHFD:1:2162:4182:19413 419 1 3054721 1 137M = 3054721 -137 CTTAGGGGCTTGAGAAAGTTCTCGCCCTCTCACCTGGGGCCTAAGATTGTATCAAGATAACTATGACAATGGCCTGACCTTTAAGGTTCCGCTTCTAACAATCATAAAGCATCCATAGGACTTCCAGGTACCCGCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFF AS:i:-5 ZS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5A131 YS:i:-5 YT:Z:CP XS:A:- NH:i:3
A00123:18:H3MHFD:1:2162:4182:19413 339 1 3054721 1 137M = 3054721 -137 CTTAGGGGCTTGAGAAAGTTCTCGCCCTCTCACCTGGGGCCTAAGATTGTATCAAGATAACTATGACAATGGCCTGACCTTTAAGGTTCCGCTTCTAACAATCATAAAGCATCCATAGGACTTCCAGGTACCCGCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:-5 ZS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5A131 YS:i:-5 YT:Z:CP XS:A:- NH:i:3
Default Cuffdiff message:
[12:34:11] Modeling fragment count overdispersion.
> Map Properties:
> Normalized Map Mass: 749021.00
> Raw Map Mass: 752500.47
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 272.39
> Estimated Std Dev: 138.57
> Map Properties:
> Normalized Map Mass: 749021.00
> Raw Map Mass: 746990.97
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 274.52
> Estimated Std Dev: 131.43
[12:35:12] Calculating preliminary abundance estimates
[12:35:12] Testing for differential expression and regulation in locus.
total-hits-norm
[15:24:31] Modeling fragment count overdispersion.
> Map Properties:
> Normalized Map Mass: 52162345.85
> Raw Map Mass: 55210687.49
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 273.32
> Estimated Std Dev: 140.81
> Map Properties:
> Normalized Map Mass: 52162345.85
> Raw Map Mass: 49297576.52
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 273.60
> Estimated Std Dev: 131.36
[15:25:33] Calculating preliminary abundance estimates