Question: Which strand specific option need to be used with HISAT2?
2
gravatar for Vasu
12 months ago by
Vasu300
Vasu300 wrote:

I'm working with RNA-seq data. For the analysis I will be using HISAT2 to align sequencing reads to the human genome GRCh38. Samples are subjected to strand-specific RNA sequencing using poly-A selection protocol and sequenced on the Illumina HiSeq 2000 system. They are paired-end reads.

I would like to know how the Hisat2 command need to be given for strand-specific option.

--rna-strandness should be RF or FR

Thank you

strandspecific rna-seq hisat2 • 1.8k views
ADD COMMENTlink modified 12 months ago by JJ430 • written 12 months ago by Vasu300
6
gravatar for apeltzer
12 months ago by
apeltzer130
Tuebingen, Germany
apeltzer130 wrote:

For Hisat2 there is two things to take care of:

If its single ended data and forward stranded you need to set: -rna-strandness F If its paired end data and forward stranded you need to set: -rna-strandness FR

Similar, if its reverse stranded for SE data: -rna-strandness R or (Paired End, reverse stranded) -rna-strandness RF

You may have to check in more detail what the protocol actually does.

ADD COMMENTlink written 12 months ago by apeltzer130

Is there a possibility to know whether it is forward stranded or reverse stranded from fatsq files I have (sample1_1.fastq, sample1_2.fatsq) ?

ADD REPLYlink modified 12 months ago • written 12 months ago by Vasu300
2

If you don't know if for sure, you can try using RSeQC to infer the strandedness:

http://rseqc.sourceforge.net

ADD REPLYlink written 12 months ago by apeltzer130

Hi,

Which Bam file I should use? Regarding bed file can I use the one which is given in the Rseqc webiste (https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/)

ADD REPLYlink written 12 months ago by Vasu300
1
gravatar for JJ
12 months ago by
JJ430
JJ430 wrote:

Most are fr-firststrand (dUTP based methods). But yes it's good to check with infer_experiment.py from RSeQC. For fr-firststrand you should use RF.

ADD COMMENTlink written 12 months ago by JJ430

ok. Which Bam file I should use? Regarding bed file can I use the one which is given in the Rseqc webiste (https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/)

ADD REPLYlink written 12 months ago by Vasu300
1

You feed in the mapping - the bam file - e.g., align with HISAT2 first without setting the strangeness (or do ... it doesn't really matter for this application) and then feed it into infer_experiment.py from RSeQC. Bed file - well this depends on the reference you have used - you need to pick the right one. Alternatively, convert the gtf or gff file corresponding to your reference (e.g., from Ensembl or Gencode) to bed e.g. using galaxy.

ADD REPLYlink written 12 months ago by JJ430

Thanks for the reply. Regarding BED file - I didn't use any reference gtf for alignment with HISAt2, I just gave the GRCh38 reference genome index. But I will be using the gencode new version gtf in Stringtie step. So Can I convert that gencode gtf to bed and use it for the infer_experiment.py ?

ADD REPLYlink written 12 months ago by Vasu300

If you use the gencode reference, you can download the same annotation also as gff3 which you can then convert to bed using galaxy. Alternatively, use this tool (I haven't tried it myself though but came across it) for gtf to bed conversion.

ADD REPLYlink written 12 months ago by JJ430

Yes, but gtf2bed doesn't give 12 column bed file. And RSeqc needs 12 column bed file right? Or I can also do it this way how to get Homo_sapiens.GRCh38.84.bed file

ADD REPLYlink written 12 months ago by Vasu300

As I said, I have not tried that tool. I downloaded the gff3 from the gencode website and converted it with galaxy to bed (you find it under conversion tools). This worked fine for me. RSeQC can work with it. However, use the awk command of Devon if you don't like galaxy :)

ADD REPLYlink written 12 months ago by JJ430

On another note, you should build the index with a gtf file - not only based on the reference sequence. See protocol for how to do this.

ADD REPLYlink written 12 months ago by JJ430

I'm using the standard index from HISAT2 website. So, do I need to build the index with the gencode gtf file first and then use it for alignment and then convert the gtf to bed and then use the infer_experiment.py. Am I right?

ADD REPLYlink written 12 months ago by Vasu300

You can use the standard index of course which already contains the transcripts (if you use the proper one - of course) - as far as I know it's based on Ensembl. I build my own so I have the newest version and the corresponding gtf / gff3 files from gencode. So yes, if you want to use the gencode annotation, I would use it 'all the way through'. Having said that it wouldn't make much difference just for the RSeQC I suppose but my guess is that you want to do something further down with the alignment where you need an annotation - so then I would stick with the same annotation for all steps.

ADD REPLYlink modified 12 months ago • written 12 months ago by JJ430

Yes, Thank you. I downloaded the gff3 file from gencode and In Galaxy (https://toolshed.g2.bx.psu.edu/) I dont see any gff3tobed converter. Could you please help me with this.

ADD REPLYlink written 12 months ago by Vasu300
1
ADD REPLYlink modified 12 months ago • written 12 months ago by JJ430

Ok. I did the conversion. And used the infer_experiment.py This is what I see

Total 9230 usable reads were sampled

This is PairEnd Data Fraction of reads failed to determine: 0.0000 Fraction of reads explained by "1++,1--,2+-,2-+": 0.0047 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9953

Does this mean fr-firststrand which RF for --rna strandness in HISAt2?

ADD REPLYlink written 12 months ago by Vasu300

Ok I got the answer from your old post [RSeQC Output from infer_experiment.py - what does it mean?] Thanks a lot for the help.

ADD REPLYlink written 12 months ago by Vasu300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2244 users visited in the last hour