Basic python loop for paired fastq files
1
0
Entering edit mode
12 days ago
rescueson • 0

I am trying to learn how to perform a spades assembly in python with multiple fastq files. If I develop a list containing multiple paired-end read files, how would I call "_1.fastq" and "_2.fastq" in each loop to perform separate assemblies? Any helpful references would be great.

Example of list:

Files =  ["SRA1_1.fastq", "SRA1_2.fastq", "SRA2_1.fastq", "SRA2_2.fastq"]

python loop • 372 views
0
Entering edit mode

how would I call "_1.fastq" and "_2.fastq" in each loop to perform separate assemblies

What are you talking about? Why would you use these files separately?

0
Entering edit mode

A spades assembly works by organizing the files in file pairs or interleaved files. I am trying to call the files in pairs.

0
Entering edit mode

If you are unfamiliar, here is an example:

spades.py --pe1-1 lib1_1.fastq --pe1-12 lib1_2.fastq -k contig -o output

0
Entering edit mode

You're using the files together in that command, not separately in two commands. Loop through all R1 and replace "R1" with "R2" where required.

0
Entering edit mode

Alternatively, and I think scales better depending on what else a pipeline entails, is to loop through a list of the root names, eg ["SRA1", "SRA2"...] and generate strings that correspond to each R1/R2 of a fastq pair.

0
Entering edit mode

Why did you delete the question?

0
Entering edit mode
8 days ago
Joe 20k

I'm hesitant to provide a direct answer, as I feel like what you're attempting is probably going to tie you in knots and is probably not the smartest approach, but taking the question at face value:

Here's a 'dumb' approach

>>> filenames = ['file1_1.ext', 'file1_2.ext', 'file2_1.ext', 'file2_2.ext']

>>> for i,j in zip(filenames[::2], filenames[1::2]):
print(i, j)

file1_1.ext file1_2.ext
file2_1.ext file2_2.ext


It should be fairly obvious how to use this going forward.

Note I call this a 'dumb' approach because it assumes your filenames are always perfectly ordered and no files are missing etc.

There are other approaches you can take that will pair the files up by name, but without knowing more about where the data is coming from or how you intend to use it its probably not helpful right now.