Question: How to assign keys and values in a directory by using python
0
gravatar for caro-ca
14 months ago by
caro-ca20
caro-ca20 wrote:

Hi! I want to map Illumina pair-end reads against a reference genome. I have a directory in which I only need to use the files that end with paired_R1.fastq.gz and paired_R2.fastq.gz for the paired reads. I am creating a script in which the paired_R1 are the keys and the paired_R2 are the values; however, I am having difficulties in assigning the keys and values in a for loop. I understand the file1 and file2 are not defined but I don't know how to set the output of "endswith" to a key and value respectively.

if __name__=='__main__':
    path = os.getcwd()
    dir_files = os.listdir(path)
    pair_reads = {}
    for file in dir_files:
        if file.endswith("_paired_R1.fastq.gz"):
            file = file1
            if file.endswith("_paired_R2.fastq.gz"):
               file = file2
               pair_reads[file1] = file2 
    print(pair_reads)

Thank you in advance!

dictionary for loop python • 314 views
ADD COMMENTlink modified 14 months ago by Brice Sarver3.6k • written 14 months ago by caro-ca20

What is the expected output? I am sure this can be done with a one-liner via the command line.

ADD REPLYlink written 14 months ago by ATpoint46k

I will use Tepid which is going to map the paired reads against the reference genome. But the command for TEPID is tepid-map -1 SRR4209894_paired_R1.fastq.gz -2 SRR4209894_paired_R2.fastq.gz -n SRR4209894 -x /../S288C/S288C -y /../S288C/S288C_reference_sequence_R64-2-1_20150113.X15_01_65525S -p 36 -s 350. For this reason, I need to assign the paired reads from my directory.

ADD REPLYlink modified 14 months ago • written 14 months ago by caro-ca20
2

Use a simple bash script.

for r1file in *_R1.fastq.gz
do
    tepid-map -1 ${r1file} -2 ${r1file/_R1/_R2} -n ${r1file%%_*} -x /../S288C/S288C -y /../S288C/S288C_reference_sequence_R64-2-1_20150113.X15_01_65525S -p 36 -s 350
done

See here to understand how the ${} parameter expansions work.

ADD REPLYlink written 14 months ago by Ram32k
1
my_key = "hey there"
my_value = "ho there"
my_dict = {}
my_dict[my_key] = my_value
ADD REPLYlink written 14 months ago by curious500

The problem is that instead of a variable, I want to assign keys and values to files in a directory.

ADD REPLYlink written 14 months ago by caro-ca20
2
gravatar for Brice Sarver
14 months ago by
Brice Sarver3.6k
United States
Brice Sarver3.6k wrote:

There are good suggestions in the comments, but (reading between the lines) I think you're having problems because you're building a dictionary where your key:value pairs are the R1 and R2 reads.

What about storing as a tuple and unpacking? You know what needs to be appended to form the read pairs (i.e., _paired_R1.fastq.gz). Grab the stem, then assign the reads based on that.

import re
results = {}    
dir_files = os.listdir(".")
# modify here as needed - you want to grab the file's stem;
# lots of ways to do this.
# I've inferred here from your code above, but a simple x.split()
# will work depending on your stem.
file_stems = [
re.sub("_paired_R1.fastq.gz", "", x) for x in dir_files
if x.endswith("_paired_R1.fastq.gz")
]
# build a tuple with the R1 and R2 names
for stem in file_stems:
  R1 = stem + "_paired_R1.fastq.gz"
  R2 = stem +  + "_paired_R2.fastq.gz"
  results[stem] = (R1, R2)

The rest is pretty straightforward. You simply iterate across your dictionary, and you'll be able to unpack with R1, R2 = results['key']. This can easily be passed to subprocess.call() or similar.

EDIT: wrapping list comprehension to avoid cutoff.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Brice Sarver3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2138 users visited in the last hour
_