Question

How can I uncompress specific files in a loop and use them in another code and compress them again?

0

Entering edit mode

4.1 years ago

caro-ca ▴ 20

Hi! I am trying to run a transposable detection software called McClintock on paired-end Illumina reads. For this step, I created a Python script to loop over the files in my directory and select the proper pair of reads. However, I am dealing with a significant amount of data. For this reason, I compressed all my data and I want to create a script that once the right pair is selected, I could unzip them and continue with the McClintock bash code. Once the code is finished, again, I want to compress the file and continue with the same process for the remaining paired-end reads. This is my Python script (not complete) so far but I have to admit that I don't know how to select the unzipped file in a variable. I don't want to use gunzip -c --stdout --to-stdout as I said, I don't have enough data storage to keep original files unchanged and create new ones.

#!/usr/bin/env python

import os
import subprocess


if __name__=='__main__':
    path = "/hosts/linuxhome/chaperone/silviav/reads/Gallone/Trimmed_files"
    dir_files = os.listdir(path)
    pair_reads = {}

    for file in sorted(dir_files):
        if file.endswith("_paired_R1.fastq.gz"):
            file1 = file
        if file.endswith("_paired_R2.fastq.gz"):
            file2 = file
            pair_reads[file1] = file2 

    for key, value in pair_reads.items():
        cmd_key = "gunzip {}".format(key)
        unzipped_key = subprocess.check_output(cmd_key, shell =True)
        cmd_value = "gunzip {}".format(value)
        unzipped_value = subprocess.check_output(cmd_value, shell = True)
        code = "bash ~/mcclintock/mcclintock.sh -r ~/mcclintock/test/sacCer2.fasta -c ~/mcclintock/test/sac_cer_TE_seqs.fasta -g ~/mcclintock/test/reference_TE_locations.gff -t ~/mcclintock/test/sac_cer_te_families.tsv -1 {} -2 {} -p 36".format(unzipped_key, unzipped_value)
        cmd = subprocess.check_output(code, shell =True)
        print( "EXIT STATUS AND TYPE", cmd)

Thank you in advance.

python • 895 views

ADD COMMENT • link 4.1 years ago by caro-ca ▴ 20

1

Entering edit mode

Are you sure looking for TEs in your reads is the best option? Can the software not take assemblies?

I also second the other comments, that there is not really any reason to use bash AND python here, one or the other should be able to handle all the steps (if you count shelling out in python).

If you really wanted to do this, you could create a python script which accepts STDIN as the data stream and then decompress the data somewhat on the fly in bash...

ADD REPLY • link 4.1 years ago by Joe 21k

0

Entering edit mode

Perfect, I will have a look at the STDIN (sys.stdin) as I am not familiar with it. Additionally, the software can look at TEs on fastq paired-end sequencing reads and not assemblies. Thank you for your help.

ADD REPLY • link 4.1 years ago by caro-ca ▴ 20

0

Entering edit mode

you asked many questions on this forum without validating any answer (e.g: C: Use of export command ; C: Software testing process failure: How can I match the already installed programs ; etc... ) . Please validate the correct answers (green mark on the left) to validate+close the questions.

ADD REPLY • link 4.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I cannot find the green mark on the left. How does it look?

ADD REPLY • link 4.1 years ago by caro-ca ▴ 20

0

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY • link 4.1 years ago by GenoMax 141k

0

Entering edit mode

Perfect! Thank you for clarifying.

ADD REPLY • link 4.1 years ago by caro-ca ▴ 20

0

Entering edit mode

instead of wrapping bash in a python script, how about using bash only ?

ADD REPLY • link 4.1 years ago by Pierre Lindenbaum 161k