Question: hisat2 and RSEM pipeline
gravatar for Shahzad
5 months ago by
Shahzad10 wrote:

Hello All I am using hisat2 for alignment and RSEM for estimated counts as suggest by a previous post on this site. I need some guidance regarding 1> Does RSEM accepts hisat2 generated transcriptome indexes?

2> rsem-calculate-expression can be used alone without rsem-prepare-reference step and BAM from hisat can be accepted in RSEM count matrix.

I tried for my Q1 here and had error that RSEM was unable to find indexes

reference indexing showed following error. (rsem)> rsem-prepare-reference RNASeq_transcriptome.fa ref rsem-synthesis-reference-transcripts 0 0RNASeq_transcriptome.fa (ASCII code 13), at line 2, position 61!scriptome.fa contains an unknown character, "rsem-synthesis-reference-transcripts ref 0 0 RNASeq_transcriptome.fa" failed! Plase check if you provide correct parameters/options for the pipeline! have used same transcriptome to build hisat2 index successfully

My RNA-seq experiment is 2 Genotypes1control, genotype1treated, Genotypes2control, genotype2treated, >3 replicates-PE each i.e 12 samples. I have reference transcriptome but no GTF/gff files available.

will be very thankful for any suggestion regarding another pipeline or current problem. thank you

ADD COMMENTlink modified 5 months ago by EagleEye6.2k • written 5 months ago by Shahzad10
gravatar for EagleEye
5 months ago by
EagleEye6.2k wrote:

This is the way I could integrate HISAT2 with RSEM,

RSEM pre-preparation:

cat Homo_sapiens.GRCh38.90.chr.gtf | grep -v "^#" | grep -w "transcript" | cut -f9 | sed 's/;/\t/g' | cut -f1,3 | sed 's/gene_id \"//' | sed 's/transcript_id \"//' | sed 's/\"//g' | awk '!x[$0]++' > transcript_gene_map.txt

RSEM indexing:

rsem-prepare-reference --star --star-path STAR_PATH/ --num-threads 8 --gtf Homo_sapiens.GRCh38.90.chr.gtf --transcript-to-gene-map transcript_gene_map.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa rsem_index/hg38

HISAT2 pre-preparation SS: Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_spliceSites.txt

HISAT2 pre-preparation Exons: Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_Exons.txt

HISAT2 indexing:

hisat2-build -p 32 --ss Homo_sapiens.GRCh38.90_spliceSites.txt --exon Homo_sapiens.GRCh38.90_Exons.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa HISAT2_index/hg38

HISAT2 alignment:

hisat2 --dta --time --summary-file $outpath/${outname}_alignment_summary.txt --avoid-pseudogene --known-splicesite-infile Homo_sapiens.GRCh38.90_spliceSites.txt --phred33 -p 8 --qc-filter -x $ht2_index -U $inpath/${outname}.fastq.gz | samtools view -bS -o $outpath/${outname}.bam -

RSEM quantification:

rsem-calculate-expression --time --num-threads 2 --phred33-quals --alignments --no-bam-output $inpath/${outname}_sorted.bam $rsemIndex $outpath/${outname}
ADD COMMENTlink written 5 months ago by EagleEye6.2k

Oops sorry that didn't work. I just checked output from this pipeline, I never succeeded in integrating HISAT2 + RSEM. Looks like RSEM cannot handle gapped alignments.

ADD REPLYlink modified 5 months ago • written 5 months ago by EagleEye6.2k

thank you for confirming. is there any suggestion about which pipeline will be appropriate. i have transcriptome.fa but no gtf file. RNAseq data from plant is paired end 4 conditions with 3 bioreps.

ADD REPLYlink written 5 months ago by Shahzad10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1068 users visited in the last hour