Hi All,
I am trying to analyze nextgen seguencing data with Biopython (on a Windows computer). My data has certain sequences that appear many times. I was wondering how I can count the repeats and at the same remove these repeats. At the end, I would like a fasta file with a read number beside every unique sequence. My understanding is that the FASTQ/A collapser of the FASTX-toolkit will do exactly what I would like. However, I would much prefer to be able to do it on my computer without dealing with something like Linux or uploading my data to Galaxy. Any tips or suggestions on how I can do this in Biopython or anything else that I can readily use on my computer would be greatly appreciated.
Thank you very much!