Question: hisat2 and RSEM pipeline
gravatar for Shahzad
11 months ago by
Shahzad10 wrote:

Hello All I am using hisat2 for alignment and RSEM for estimated counts as suggest by a previous post on this site. I need some guidance regarding 1> Does RSEM accepts hisat2 generated transcriptome indexes?

2> rsem-calculate-expression can be used alone without rsem-prepare-reference step and BAM from hisat can be accepted in RSEM count matrix.

I tried for my Q1 here and had error that RSEM was unable to find indexes

reference indexing showed following error. (rsem)> rsem-prepare-reference RNASeq_transcriptome.fa ref rsem-synthesis-reference-transcripts 0 0RNASeq_transcriptome.fa (ASCII code 13), at line 2, position 61!scriptome.fa contains an unknown character, "rsem-synthesis-reference-transcripts ref 0 0 RNASeq_transcriptome.fa" failed! Plase check if you provide correct parameters/options for the pipeline! have used same transcriptome to build hisat2 index successfully

My RNA-seq experiment is 2 Genotypes1control, genotype1treated, Genotypes2control, genotype2treated, >3 replicates-PE each i.e 12 samples. I have reference transcriptome but no GTF/gff files available.

will be very thankful for any suggestion regarding another pipeline or current problem. thank you

ADD COMMENTlink modified 11 months ago by EagleEye6.5k • written 11 months ago by Shahzad10
gravatar for EagleEye
11 months ago by
EagleEye6.5k wrote:

This is the way I could integrate HISAT2 with RSEM,

RSEM pre-preparation:

cat Homo_sapiens.GRCh38.90.chr.gtf | grep -v "^#" | grep -w "transcript" | cut -f9 | sed 's/;/\t/g' | cut -f1,3 | sed 's/gene_id \"//' | sed 's/transcript_id \"//' | sed 's/\"//g' | awk '!x[$0]++' > transcript_gene_map.txt

RSEM indexing:

rsem-prepare-reference --star --star-path STAR_PATH/ --num-threads 8 --gtf Homo_sapiens.GRCh38.90.chr.gtf --transcript-to-gene-map transcript_gene_map.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa rsem_index/hg38

HISAT2 pre-preparation SS: Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_spliceSites.txt

HISAT2 pre-preparation Exons: Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_Exons.txt

HISAT2 indexing:

hisat2-build -p 32 --ss Homo_sapiens.GRCh38.90_spliceSites.txt --exon Homo_sapiens.GRCh38.90_Exons.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa HISAT2_index/hg38

HISAT2 alignment:

hisat2 --dta --time --summary-file $outpath/${outname}_alignment_summary.txt --avoid-pseudogene --known-splicesite-infile Homo_sapiens.GRCh38.90_spliceSites.txt --phred33 -p 8 --qc-filter -x $ht2_index -U $inpath/${outname}.fastq.gz | samtools view -bS -o $outpath/${outname}.bam -

RSEM quantification:

rsem-calculate-expression --time --num-threads 2 --phred33-quals --alignments --no-bam-output $inpath/${outname}_sorted.bam $rsemIndex $outpath/${outname}
ADD COMMENTlink written 11 months ago by EagleEye6.5k

Oops sorry that didn't work. I just checked output from this pipeline, I never succeeded in integrating HISAT2 + RSEM. Looks like RSEM cannot handle gapped alignments.

ADD REPLYlink modified 11 months ago • written 11 months ago by EagleEye6.5k

thank you for confirming. is there any suggestion about which pipeline will be appropriate. i have transcriptome.fa but no gtf file. RNAseq data from plant is paired end 4 conditions with 3 bioreps.

ADD REPLYlink written 11 months ago by Shahzad10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour