Dear All, I have written script using python and sed script that will be usefull for removing specific reads from a fastq file based on a reads IDs, This script will works fine and also fast. If there is any problems found in this script or there is any more reliable solution available please share it.
reads_ids.txt : text file with reads IDs look like this
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72 @SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72 @SRR001667.1 071112_SLXA-EAS1_s_7:5:1:818:346 length=72 @SRR001667.2 071112_SLXA-EAS1_s_7:5:1:802:339 length=72
remove_reads.py : Python script used to remove reads from a fastq file using reads IDs
import sys import os try: ffastq = sys.argv fastq = open(ffastq) except: print "Usage: python remove_reads.py <reads.fastq> <reads_ids.txt>" exit() try: fids = sys.argv ids = open(fids) except: print "Usage: python remove_reads.py <reads.fastq> <reads_ids.txt>" exit() out_file = ffastq + "_filtered.fastq" new_fid =  for fid in ids: fid = fid.rstrip() new_fid.append('/' + fid + '/,+3d') cmd = "sed -e '" + str(";".join(new_fid)) + "' " + ffastq + " > " + out_file os.system(cmd) fastq.close() ids.close()