I have a paired end fastq file and my experiment is designed in a way that each PAIRED READ has ONE barcode in it. This barcode might be on the forward read or on the reverse read (not both of them) and the barcode has a specific sequence before and after it that helps to identify the barcode. So, some of the forward reads have the barcode and some of them do not. This is also true for reverse reads.
The sequence of barcode is located at the beginning of reads (of course after pre processing trimming) and it is like this: GTC NNN NNN G
Does any one know a reliable tool for extraction of UMI sequences in this experimental design and quantifying the number of unique UMIs aligned to each gene?
Thanks in advance