I have 500 files, each file containing 4000 DNA sequences in FASTQ format. I have 20 sequence names or ID's extracted from BAM files based on alignments with DNA sequences in 500 FASTQ files. I want to identify which of the 500 files contain DNA sequences corresponding to 20 sequence ID's. Ultimately I want to eliminate the sequences from files corresponding to sequence ID's and resave the files. Please guide/help.
OK, thanks again. I tested and it worked for one fastqfile with list of id's. I do not know how to make it work and scan all 500 files! I used wild card (*.fastq) for in and out but gave error. Unless this command can be used in a sccript and loop. Ideas?
I wrote this loop and it worked. Thanks for tips/help
This is not a novel answer, it is a loop that uses an existing answer. You should add it as a comment to the answer, not as a new answer.