Question

Hisat2 doesn't map to all chomosomes

0

Entering edit mode

18 months ago

GCar • 0

I'm trying to analyse some RNA-seq data which was handed to me with very little info, only fastq files.

I first trimmed illumina adapters and then ran Hisat2 on the most recent GRCh38 primary assembly with:

hisat2-build Homo_sapiens.GRCh38.dna.primary_assembly.fa hg38
hisat2 -x hg38/hg38 -1 trimmed_${name1}_R1.fastq.gz -2 trimmed_${name1}_R2.fastq.gz -S ${name1}.sam

and I don't get particularly great alignment rate (only 50% or so). When I look at the bam files on IGV, there is only alignments on chr1, 10 and 11.

I've checked with idxstats:

 samtools idxstats ${name1}.nodup.srt.bam | head -n 10
1       248956422       10142469        148
10      133797422       4686247 103
11      26070428        2051017 13
*       0       0       15262866

and it confirms that only chr1, 10 and 11 are there. Why is Hisat2 only aligning to chromosomes with 1 in the name? Did I do something incorrect with the hisat2-build?

Thanks

chromosomes featureCounts missing alignment Hisat2 • 910 views

ADD COMMENT • link updated 18 months ago by Marco Pannone ▴ 790 • written 18 months ago by GCar • 0

score 2 · Answer 1 · 2022-10-21

2

Entering edit mode

18 months ago

Marco Pannone ▴ 790

Where did you obtain "Homo_sapiens.GRCh38.dna.primary_assembly.fa"?

If you want to build your own indexed genome, I would suggest you retrieve .fasta sequences for all chromosomes of interest from here, then merge them with something like cat *.fa > hg38.fa and run hisat2-build using hg38.fa as input.

ADD COMMENT • link 18 months ago by Marco Pannone ▴ 790

2

Entering edit mode

What is the output of grep '^>' Homo_sapiens.GRCh38.dna.primary_assembly.fa and samtools view -H your.bam. The odd name "no.dup.srt" suggests further manipulation after hisat, please show respective code, this is 99.9% either an incomplete genome file or a code error on the way after hisat.

ADD REPLY • link 18 months ago by ATpoint 82k

0

Entering edit mode

You're both on the money. grep '^>' Homo_sapiens.GRCh38.dna.primary_assembly.fa gives me only 1, 10 and 11. Plus simply checking the file size - the .fa is only 400MB. This is solved, thanks both.

ADD REPLY • link 18 months ago by GCar • 0

0

Entering edit mode

I got it from here: https://ftp.ensembl.org/pub/release-108/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

ADD REPLY • link 18 months ago by GCar • 0