Hi, Could someone please help me with removing reads from a fastq file from a specific genomic location? I have only been able to look at methods for removing reads from a specific chromosome from the aligned sam file, using samtools or from fastq using sequence IDs. I would like to remove PCR contaminants from my fastq files by giving specific genome coordinates. I appreciate your help!
FASTQ files do not contain coordinates, so it is not possible to remove data based on that parameter. You would need to align and then filter, or filter by the sequence with one of the adapter-trimming tools (e.g., BBDuk or Trimmomatic).
Instead of depending on genome co-ordinates you may want to use
clumpify.sh from BBMap suite to identify duplicates (you can identify optical, PCR and other kinds) independent of alignments. Then depending on the severity of the issue decide what to do with them (just mark or remove). See this post for additional details on how you would use this tool: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files