First post to the forum, sorry I'm new at this:
I inherited a BED file which has 32-length barcodes.
1) The first 8 digits are the cell's unique code
2) The next 8 match an entry from one of three sets (either 7a, 7b or 7c)
3) The next 8 digits are also a part of the cells unique code
4) The last 8 digits match one of two sets (5a or 5b)
We have six technical replicates that are defined as some combination of a 7x and a 5x code.
I did a lot of analysis with this data in Signac/Seurat this summer; I was given a filtered Seurat object to work with. Now, I'm trying to re-do the QC (most importantly, re-do the doublet analysis), and I should input the data as a separate BED file for each of the six technical replicates. This is my first time having access to the original BED file.
I have a lot of the barcodes in my Seurat object but because of filtering it doesn't have all of them. I have a simple script that sorts the barcodes into one of the six technical replicates, but I don't have a list of all the barcodes in the BED file (I'm starting to realize this is usually given separately, but I didn't inherit it).
So I guess my question is whether there is a way to do this filtering entirely in terminal (easily) and/or if there is an easy way to extract all the unique barcodes from a BAM or BED file. [After extracting the barcodes, I know how to make 6 whitelists (.csv) and then split using sinto].
My data looks like this:
chr1 10126 10175 ATTCAGAAAAGAGGCAACGTCCTGTCTTACGC 1
chr1 10150 10180 ATTACTCGAACCAGGTAGGATAGGCTCCTTAC 1
chr1 10151 10192 GAGATTCCCGTATAGAAGGATAGGACTCTAGG 1