Question: Hisat2 on multiple paired-end input
1
gravatar for dovah
2.1 years ago by
dovah30
dovah30 wrote:

Hi there :)

I'm working with D. melanogaster, and trying to align Illumina paired-end reads to reference genome using Hisat2. My ultimate goal is to quantify the detected isoforms. However, I have a problem with the output, as the resulting *.sam file has no alignment inside (has only lines starting with @HD and @SQ with no alignments).

The reference genome I use is: Drosophila_melanogaster.BDGP6.31.dna.genome.fa (from Ensembl). The annotation I use is: Drosophila_melanogaster.BDGP6.84.gtf (also from Ensembl). For hisat2 manual, I'm using: https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2 .

The sequencing center provided me multiple files (70 x 2) in .fastq format. I've renamed them as: 001_R1.fastq, 002_R2.fastq, 002_R1.fastq, 002_R2.fastq ... etc. 001_R1.fastq and 001_R2.fastq are thus paired. Reads were trimmed with cutadapt.

I first indexed the reference : hisat2-build Drosophila_melanogaster.BDGP6.31.dna.genome.fa. This worked fine, I have 8x *.ht2 files in my directory. Then, I extracted the splice sites from the ref: extract_splice_sites.py Drosophila_melanogaster.BDGP6.84.gtf. This also worked fine, I have a *.splices.txt in my directory.

Then, and here comes the tricky part, I'd like to run hisat2 iteratively on my *.fastq , defining them as part of pair 1 (-1 parameter) or pair 2 (-2 parameter). As hisat2 takes input files as comma-delimited (from manual > Command-Line > Usage), I tried to run job like this:

hisat2 -x bt2_index.idx -1 `ls
*_R1* | tr '\n' ','` -2 `ls *_R2* | tr '\n' ','` | samtools view -bS > Dmel_hisat.bam

Anyways, this does not seem to be correclty interpreted by hisat2. I don't have error message, but my *.sam contains no alignment.

So, How do you proceed when having multiple paired *.fastq as input?

Many thanks for your help.

ADD COMMENTlink modified 2.1 years ago by Devon Ryan80k • written 2.1 years ago by dovah30
6
gravatar for Devon Ryan
2.1 years ago by
Devon Ryan80k
Freiburg, Germany
Devon Ryan80k wrote:

If you have multiple files from the same sample and plan to process them at once anyway then normally you just concatenate them into a single file. If you have multiple samples then never align them together. Almost no aligner supports that.

Whenever you have issues with commands like this, use echo and a shell script to print out what the exact final command that would be run is and check and see if that seems reasonable. In this case, each of your lists of files ends with a comma, so perhaps that's causing the problem. You might want to use something like snakemake, where you can more easily create list and merge things together into single strings (at least if you know python).

ADD COMMENTlink written 2.1 years ago by Devon Ryan80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour