extract unmapped reads from the paired end samples using hisat2
1
0
Entering edit mode
3.1 years ago
winmoon6 • 0

Hi AlI, I am trying to get the unmapped reads for paired end samples using hisat2 in the snakemake and command is :

hisat2  \
-S {output.sam} \
-x {params.index} \
-1 {input.fq1} \
-2 {input.fq2} \
--un-conc {output.fq}\
--threads {threads}

I am getting an error of

Waiting at most 60 seconds for missing files. Removing output files of failed job remove_host_reads since they might be corrupted: p21005_ht2.sam

But when I ran the same sample without --un-conc option, it worked (and gave me both mapped + unmapped reads which are crazy big files).If I omitted ".sam" in the output as I am interested in unmapped reads, I got the error of "missing files". Any help would be appreciated.Thanks

pairedends hisat2 unmappedreads snakemake • 3.7k views
ADD COMMENT
1
Entering edit mode

Just to add a small point of clarification, the following error message:

Waiting at most 60 seconds for missing files. Removing output files of failed job remove_host_reads since they might be corrupted: p21005_ht2.sam

is actually a Snakemake error message. My guess is that two separate output files are being created (since you have PE reads), and neither are named according to what Snakemake is expecting, so it's throwing that error.

ADD REPLY
0
Entering edit mode

Thank you @Dave Carlson, I will look into it. I am wondering it will raise the error as .sam and .fq files will have different naming convention. surprisingly, It is taking longer to generate the result if I eliminate .sam output. Is there any workaround for it?

ADD REPLY
0
Entering edit mode

Without seeing the full Snakemake rule that is being used, it's hard to say for certain. You might have better luck if the argument supplied to --un-conc uses Snakemake's params instead of output

ADD REPLY
0
Entering edit mode

Thanks @Dave Carlson, Here is the snakelike rule for your reference:

rule all:
    """
     Main workflow.
    """
    input:
        expand("data/proc/{sample}_ht2.{ext}.fq",ext=ext, sample = SAMPLES)


rule remove_host_reads:
    """
    Hisat2 to align the sequence
    """
    input:
        fq_1='data/{sample}_R1_001.fastq.gz',
        fq_2='data/r{sample}_R2_001.fastq.gz'
    output:
        fq="output/{sample}_ht2.{ext}.fq",
        sam="output/{sample}_ht2.{ext}.sam"
    params:
        index = "refs/hu_depletion"
    threads: 20
    shell:
        """
        hisat2  \
        -x {params.index} \
        -1 {input.fq1} \
        -2 {input.fq2} \
        -S {output.sam} \
        --un-conc {output.fq}\
        --threads {threads}
        """

The addition of {ext} in .sam is not the correct way but only to avoid the error of "output files don't have similar parameters". The ext = ["1","2"].

ADD REPLY
1
Entering edit mode
3.1 years ago
GenoMax 141k

Following works for me and produces unmapp.1 and unampp.2 files.

hisat2 -x genome -1 test1000.R1.fq.gz -2 test1000.R2.fq.gz -S test.sam --un-conc unmapp

hisat2 is one of those programs that is probably sensitive to order of options.

ADD COMMENT
0
Entering edit mode

Thanks @GenoMax, I tried this command
hisat2 \ -x {params.index} \ -1 {input.fq_1} \ -2 {input.fq_2} \ -S {output.sam} \ --un-conc {output.fq}\ --threads {threads} and still getting the same errors. Am I still missing something?

ADD REPLY
0
Entering edit mode

As Dave Carlson said above if you are running this via snakemake then please clarify that. Issue is likely on that side.

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6