Question

hisat2 and RSEM pipeline

0

Entering edit mode

5.5 years ago

Shahzad ▴ 30

Hello All I am using hisat2 for alignment and RSEM for estimated counts as suggest by a previous post on this site. I need some guidance regarding 1> Does RSEM accepts hisat2 generated transcriptome indexes?

2> rsem-calculate-expression can be used alone without rsem-prepare-reference step and BAM from hisat can be accepted in RSEM count matrix.

I tried for my Q1 here and had error that RSEM was unable to find indexes

reference indexing showed following error. (rsem)> rsem-prepare-reference RNASeq_transcriptome.fa ref rsem-synthesis-reference-transcripts 0 0RNASeq_transcriptome.fa (ASCII code 13), at line 2, position 61!scriptome.fa contains an unknown character, "rsem-synthesis-reference-transcripts ref 0 0 RNASeq_transcriptome.fa" failed! Plase check if you provide correct parameters/options for the pipeline! have used same transcriptome to build hisat2 index successfully

My RNA-seq experiment is 2 Genotypes1control, genotype1treated, Genotypes2control, genotype2treated, >3 replicates-PE each i.e 12 samples. I have reference transcriptome but no GTF/gff files available.

will be very thankful for any suggestion regarding another pipeline or current problem. thank you

rna-seq software error hisat2 gtf RSEM • 6.0k views

ADD COMMENT • link updated 5.5 years ago by EagleEye 7.5k • written 5.5 years ago by Shahzad ▴ 30

score 1 · Answer 1 · 2018-11-06

This is the way I could integrate HISAT2 with RSEM,

RSEM pre-preparation:

cat Homo_sapiens.GRCh38.90.chr.gtf | grep -v "^#" | grep -w "transcript" | cut -f9 | sed 's/;/\t/g' | cut -f1,3 | sed 's/gene_id \"//' | sed 's/transcript_id \"//' | sed 's/\"//g' | awk '!x[$0]++' > transcript_gene_map.txt

RSEM indexing:

rsem-prepare-reference --star --star-path STAR_PATH/ --num-threads 8 --gtf Homo_sapiens.GRCh38.90.chr.gtf --transcript-to-gene-map transcript_gene_map.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa rsem_index/hg38

HISAT2 pre-preparation SS:

hisat2_extract_splice_sites.py Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_spliceSites.txt

HISAT2 pre-preparation Exons:

hisat2_extract_exons.py Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_Exons.txt

HISAT2 indexing:

hisat2-build -p 32 --ss Homo_sapiens.GRCh38.90_spliceSites.txt --exon Homo_sapiens.GRCh38.90_Exons.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa HISAT2_index/hg38

HISAT2 alignment:

hisat2 --dta --time --summary-file $outpath/${outname}_alignment_summary.txt --avoid-pseudogene --known-splicesite-infile Homo_sapiens.GRCh38.90_spliceSites.txt --phred33 -p 8 --qc-filter -x $ht2_index -U $inpath/${outname}.fastq.gz | samtools view -bS -o $outpath/${outname}.bam -

RSEM quantification:

rsem-calculate-expression --time --num-threads 2 --phred33-quals --alignments --no-bam-output $inpath/${outname}_sorted.bam $rsemIndex $outpath/${outname}