I'm analyzing a ChIP-seq data, and I having some trouble filtering out "good" reads for us. Briefly, I've got a fastq file, then I sorted out reads that has 5' barcode sequence with no mismatch. Because the barcode sequence was not unique enough the reads aligned well even with barcode.
I'm trying to filter out reads with artificial barcode. So, I aligned the barcoded and the barcode trimmed reads respectively to the hg19 genome with exact match. Then, to get the not endogenous 5' barcoded reads I need to filter out the exactly aligned barcoded reads from the exactly aligned not barcoded reads.
Is there an easy was to do this? I'm a bit confused.