I am mapping my rna-seq reads using hisat2. My library is single-end strand-specific (dUTP) illumina-TrueSeq-Stranded.
I am confused about what option is correct. Hisat2 manual say:
"For single-end reads, use F or R. 'F' means a read corresponds to a transcript. 'R' means a read corresponds to the reverse complemented counterpart of a transcript"
is correct if I choose "F" option??. if I visualize my alignments in IGV for example, all my reads must be in "red color" ?
what means if i have blue and red?
Yes, reads are arrow-shaped. and I have in both directions. if my library was dUTP, the arrows will point at the left? or right?
I hope you can watch the image
ADD REPLYto answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.
Well, you selected a quite bad example to look at because this looks like overlapping genes on the (very gene-dense) chrMT. Select a "normal" locus and look again, for example, the GAPDH gene.
That depends on the direction of your gene. If your gene's start is left and it's end is right, your reads will go from right to left. Reverse stranded.
What do you think about that
Oh, I wasn't aware that you are not working on human samples, so Gapdh1 turns out to be not such a good example as I thought. Next time you ask a question, don't forget to mention the organism you are working on.
Nevertheless, look at the left-most reads in the screenshot.
Same for Gapdh1:
So, therefore, it's indeed reverse stranded.
Yes, I forgot say that is D. melanogaster.
I agree with you, when the gene goes from right to left, reads go from left to right, not only for Gapdh1. However as in the post, seem to be that option F or R do not have effect on the result.
I mapped my reads using F or R and the IGV visualizations are the same
What did you do at the end?.
I am still confused about what option I have to use in hisat2 to my rna-seq data, F or R?
On the other hand, if I count the reads with the XS attribute tag in the sam outputs using F or R options, the values are different, example:
hisat2 -q -p 30 --rna-strandness F -x ../../indice/drosophila -U 40trimo.fastq -S 40_F_strand.sam
grep -o XS:A:+ 40_F_strand.sam| wc -l
hisat2 -q -p 30 --rna-strandness R -x ../../indice/drosophila -U 40trimo.fastq -S 40_R_strand.sam
grep -o XS:A:+ 40_R_strand.sam| wc -l
I am worried because the next step is run cuffdiff with the hisat2 outputs, and their results completely different if I use "F" in the --rna-strandness parameter or if I use the default option ( not strandness). I have not try with the outputs of --rna-strandness F.
So apparently it does make a difference.
Since you reads are reversely stranded, you should use
Thank you for your comments I am very grateful.
By the way, see also my reply here: A: Hisat R strandness aligning same strand as F strandness, how to do it correctly?
Perhaps things have changed, but this parameter used to do nothing in my tests.