So, I am new in python (just did 1 week crash course), and a colleague asked for my help to create a little python3 program that would input a fastq file, a sequence query, a barcode file and an output file, and would find him all the individual barcodes (from the barcode file) that are found with the query sequence in the fastq file.
I wrote this little contraption of mine that almost does the job:
infile = open(sys.argv, "r") vfile = sys.argv barcode = open(sys.argv, "r") outfile = open(sys.argv, "w") usedbarcodes =  count = 0 seq = "" for line in infile: if count == 0: Id = line.rstrip() elif count == 1: if line.find(vfile) > 0: for bar in barcode: print(bar) if line.find(bar) > 0: if bar in usedbarcodes: break else: print("Yeah") seq = line.rstrip() bar = bar.rstrip() usedbarcodes.append(bar) print("Used barcodes until now",usedbarcodes) break elif count == 2: sign = line.rstrip() elif count == 3: q = line.rstrip() if len(seq) > 10: sequence = [Id,seq,sign,q] print("\n".join(sequence), file = outfile) count = 0 seq = "" if count < 3: count += 1 infile.close() barcode.close() outfile.close()
The problem seems to be that the order of the barcodes matters when trying to find them, which would indicate that the loop does not search for all the barcodes everytime but just the ones that have not been searched for before. I expected it to be restarted everytime, and my question is if anyone could tell me why it does that and how to avoid it.
Sorry again for the probably extremely inefficient and complicated codding. I will gladly accept any contructive criticism on that too =D. Thanks a lot in advance!