extract unmapped reads from the paired end samples using hisat2
1
0
Entering edit mode
6 weeks ago
winmoon6 • 0

Hi AlI, I am trying to get the unmapped reads for paired end samples using hisat2 in the snakemake and command is :

hisat2  \
-S {output.sam} \
-x {params.index} \
-1 {input.fq1} \
-2 {input.fq2} \
--un-conc {output.fq}\


I am getting an error of

Waiting at most 60 seconds for missing files. Removing output files of failed job remove_host_reads since they might be corrupted: p21005_ht2.sam


But when I ran the same sample without --un-conc option, it worked (and gave me both mapped + unmapped reads which are crazy big files).If I omitted ".sam" in the output as I am interested in unmapped reads, I got the error of "missing files". Any help would be appreciated.Thanks

pairedends hisat2 unmappedreads snakemake • 457 views
1
Entering edit mode

Just to add a small point of clarification, the following error message:

Waiting at most 60 seconds for missing files. Removing output files of failed job remove_host_reads since they might be corrupted: p21005_ht2.sam


is actually a Snakemake error message. My guess is that two separate output files are being created (since you have PE reads), and neither are named according to what Snakemake is expecting, so it's throwing that error.

0
Entering edit mode

Thank you @Dave Carlson, I will look into it. I am wondering it will raise the error as .sam and .fq files will have different naming convention. surprisingly, It is taking longer to generate the result if I eliminate .sam output. Is there any workaround for it?

0
Entering edit mode

Without seeing the full Snakemake rule that is being used, it's hard to say for certain. You might have better luck if the argument supplied to --un-conc uses Snakemake's params instead of output

0
Entering edit mode

Thanks @Dave Carlson, Here is the snakelike rule for your reference:

rule all:
"""
Main workflow.
"""
input:
expand("data/proc/{sample}_ht2.{ext}.fq",ext=ext, sample = SAMPLES)

"""
Hisat2 to align the sequence
"""
input:
fq_1='data/{sample}_R1_001.fastq.gz',
fq_2='data/r{sample}_R2_001.fastq.gz'
output:
fq="output/{sample}_ht2.{ext}.fq",
sam="output/{sample}_ht2.{ext}.sam"
params:
index = "refs/hu_depletion"
shell:
"""
hisat2  \
-x {params.index} \
-1 {input.fq1} \
-2 {input.fq2} \
-S {output.sam} \
--un-conc {output.fq}\
"""


The addition of {ext} in .sam is not the correct way but only to avoid the error of "output files don't have similar parameters". The ext = ["1","2"].

1
Entering edit mode
6 weeks ago
GenoMax 101k

Following works for me and produces unmapp.1 and unampp.2 files.

hisat2 -x genome -1 test1000.R1.fq.gz -2 test1000.R2.fq.gz -S test.sam --un-conc unmapp


hisat2 is one of those programs that is probably sensitive to order of options.

0
Entering edit mode

Thanks @GenoMax, I tried this command
hisat2 \ -x {params.index} \ -1 {input.fq_1} \ -2 {input.fq_2} \ -S {output.sam} \ --un-conc {output.fq}\ --threads {threads} and still getting the same errors. Am I still missing something?

0
Entering edit mode

As Dave Carlson said above if you are running this via snakemake then please clarify that. Issue is likely on that side.