Set up the exons/splice sites

Question

How to map RNA-seq with hisat2, stringtie, an assembly fasta, a gtf file, and a transcript fasta?

0

Entering edit mode

5.3 years ago

O.rka ▴ 710

I have the following files: assembly.fa transcripts.fa annotation.gtf

My organism is eukaryotic with introns so I was to use the hisat2 -> stringtie pipeline.

The example in the below link looks like it maps to the chromosomes/assembly and not the transcripts with HISAT2. If there were introns separating 2 exons then wouldn't the mapping be partial and the best way would be to map to the transcripts? https://davetang.org/muse/2017/10/25/getting-started-hisat-stringtie-ballgown/

Does anyone have a way to pipe hisat2 directly into stringtie? I know I'm supposed to use the --dtf flag in HISAT2 but I haven't figured out if I'm mapping to the transcripts or the assembly?

hisat2 mapping transcriptome stringtie • 3.2k views

ADD COMMENT • link updated 5.3 years ago by swbarnes2 14k • written 5.3 years ago by O.rka ▴ 710

score 2 · Answer 1 · 2019-01-11

2

Entering edit mode

5.3 years ago

swbarnes2 14k

HISAT is a splice-aware aligner. You align to genomes, and it is smart enough to know that many reads will align with large gaps. If you look at the index generating step, it is making the index with the guidance of a gtf with genomic features annotated by genomic coordinates.

ADD COMMENT • link 5.3 years ago by swbarnes2 14k

0

Entering edit mode

Thanks, this is really helpful. So I would do:

Set up the exons/splice sites

hisat2_extract_splice_sites.py annotation.gtf > splicesites.tsv hisat2_extract_exons.py annotation.gtf > exons.tsv

Build index

hisat2-build --ss ./splicesites.tsv --exon ./exons.tsv assembly.fa organism_A

hisat2 -> samtools -> stringtie

hisat2 --dta -x ./assembly -1 ./reads/R1.fastq -2 ./reads/R2.fastq | samtools view -Su | samtools sort - | stringtie -G annotation.gtf -A sample_counts.tsv

Does the above command look correct?

ADD REPLY • link 5.3 years ago by O.rka ▴ 710