Question

Hisat2 on multiple paired-end input

1

Entering edit mode

8.0 years ago

dovah ▴ 40

Hi there :)

I'm working with D. melanogaster, and trying to align Illumina paired-end reads to reference genome using Hisat2. My ultimate goal is to quantify the detected isoforms. However, I have a problem with the output, as the resulting *.sam file has no alignment inside (has only lines starting with @HD and @SQ with no alignments).

The reference genome I use is: Drosophila_melanogaster.BDGP6.31.dna.genome.fa (from Ensembl). The annotation I use is: Drosophila_melanogaster.BDGP6.84.gtf (also from Ensembl). For hisat2 manual, I'm using: https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2 .

The sequencing center provided me multiple files (70 x 2) in .fastq format. I've renamed them as: 001_R1.fastq, 002_R2.fastq, 002_R1.fastq, 002_R2.fastq ... etc. 001_R1.fastq and 001_R2.fastq are thus paired. Reads were trimmed with cutadapt.

I first indexed the reference : hisat2-build Drosophila_melanogaster.BDGP6.31.dna.genome.fa. This worked fine, I have 8x *.ht2 files in my directory. Then, I extracted the splice sites from the ref: extract_splice_sites.py Drosophila_melanogaster.BDGP6.84.gtf. This also worked fine, I have a *.splices.txt in my directory.

Then, and here comes the tricky part, I'd like to run hisat2 iteratively on my *.fastq , defining them as part of pair 1 (-1 parameter) or pair 2 (-2 parameter). As hisat2 takes input files as comma-delimited (from manual > Command-Line > Usage), I tried to run job like this:

hisat2 -x bt2_index.idx -1 `ls
*_R1* | tr '\n' ','` -2 `ls *_R2* | tr '\n' ','` | samtools view -bS > Dmel_hisat.bam

Anyways, this does not seem to be correclty interpreted by hisat2. I don't have error message, but my *.sam contains no alignment.

So, How do you proceed when having multiple paired *.fastq as input?

Many thanks for your help.

RNA-Seq sequencing genome isoform read • 5.9k views

ADD COMMENT • link updated 8.0 years ago by Devon Ryan 104k • written 8.0 years ago by dovah ▴ 40

score 6 · Accepted Answer · 2016-05-08

If you have multiple files from the same sample and plan to process them at once anyway then normally you just concatenate them into a single file. If you have multiple samples then never align them together. Almost no aligner supports that.

Whenever you have issues with commands like this, use echo and a shell script to print out what the exact final command that would be run is and check and see if that seems reasonable. In this case, each of your lists of files ends with a comma, so perhaps that's causing the problem. You might want to use something like snakemake, where you can more easily create list and merge things together into single strings (at least if you know python).