hisat2 compatibility for long read
Entering edit mode
2.7 years ago
kishorssf91 ▴ 10


I am trying to align PacBio transcriptome reads against the genome to count the gene number. For pair end read i used the following workflow:

# convert gff to gtf
/home/software/cufflinks-2.2.1/gffread xxx.gff -T -o xxx.gtf

# build index
/home/software/hisat2-2.2.1/hisat2_extract_exons.py xxx.gtf > xxx.exon

/home/software/hisat2-2.2.1/hisat2_extract_splice_sites.py xxx.gtf > xxx.ss

/home/software/hisat2-2.2.1/hisat2-build -p 20 xxx.fa --ss xxx.ss --exon xxx.exon xxx_tran

# align reads

/home/software/hisat2-2.2.1/hisat2 -p 80 -x ./xxx_tran -1 R1.fq -2 R2.fq -S gonad.sam

# convert to bam and sort by read name
samtools view -bS gonad.sam -o gonad.bam 

samtools sort -@ 80 gonad.bam -o gonad.sorted.bam

# count gene
htseq-count -f bam --strand=no gonad.sorted.bam xxx.all.gtf > gonad-counts.txt

But in case of Long read i am not sure what i need to during the alignment stage. I have tried with minimap like as following:

# align reads
/home/software/minimap2-2.17_x64-linux/minimap2 -ax splice:hq -uf yyy.fa gill.fq > gill.sam

in following, I got stuck in counting stage:

# count gene
htseq-count -f bam --strand=no gill.sorted.bam yyy.gtf > gill-counts.txt

My question is that during counting it requires sorted bam and genome gtf file that was generated succesfully during previous staps, why it is getting failed even though the same pipeline is absolutely fine for pair end reads? Hisat2 can align long read? Is there any other way do that?

As a novice i will be always grateful to you. Thanks.

Long-read hisat2 • 836 views

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6