Basic python loop for paired fastq files
1
0
Entering edit mode
23 months ago
rescueson • 0

I am trying to learn how to perform a spades assembly in python with multiple fastq files. If I develop a list containing multiple paired-end read files, how would I call "_1.fastq" and "_2.fastq" in each loop to perform separate assemblies? Any helpful references would be great.

Example of list:

Files =  ["SRA1_1.fastq", "SRA1_2.fastq", "SRA2_1.fastq", "SRA2_2.fastq"]
python loop • 1.3k views
ADD COMMENT
0
Entering edit mode

how would I call "_1.fastq" and "_2.fastq" in each loop to perform separate assemblies

What are you talking about? Why would you use these files separately?

ADD REPLY
0
Entering edit mode

A spades assembly works by organizing the files in file pairs or interleaved files. I am trying to call the files in pairs.

ADD REPLY
0
Entering edit mode

If you are unfamiliar, here is an example:

spades.py --pe1-1 lib1_1.fastq --pe1-12 lib1_2.fastq -k contig -o output

ADD REPLY
0
Entering edit mode

You're using the files together in that command, not separately in two commands. Loop through all R1 and replace "R1" with "R2" where required.

ADD REPLY
0
Entering edit mode

Alternatively, and I think scales better depending on what else a pipeline entails, is to loop through a list of the root names, eg ["SRA1", "SRA2"...] and generate strings that correspond to each R1/R2 of a fastq pair.

ADD REPLY
0
Entering edit mode

Why did you delete the question?

ADD REPLY
0
Entering edit mode
23 months ago
Joe 21k

I'm hesitant to provide a direct answer, as I feel like what you're attempting is probably going to tie you in knots and is probably not the smartest approach, but taking the question at face value:

Here's a 'dumb' approach

>>> filenames = ['file1_1.ext', 'file1_2.ext', 'file2_1.ext', 'file2_2.ext']

>>> for i,j in zip(filenames[::2], filenames[1::2]):
        print(i, j)

file1_1.ext file1_2.ext
file2_1.ext file2_2.ext

It should be fairly obvious how to use this going forward.

Note I call this a 'dumb' approach because it assumes your filenames are always perfectly ordered and no files are missing etc.

There are other approaches you can take that will pair the files up by name, but without knowing more about where the data is coming from or how you intend to use it its probably not helpful right now.

ADD COMMENT

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6