I'd like to do the following with a huge bam file, reads are randomly barcoded:
1. Take all reads associated with barcode (DONE)
2. If possible build a fragment using reads mapped within say 100kb of each other (DONE)
3. Find coverage for this barcode in each assembled "fragment" - obviously I am expecting this number to be quite low- maybe a few hundred reads in a region of a few hundred kb since I'm only looking at reads from a single barcode. I've used bedtools coverage/genomecov in the past so l'd be using this again for simplicity's sake.
My problem is the following: Once I've gotten the reads with barcode XYZ out of the bamfile (samtools view | less -S | grep) into an output file, what do I actually have there? Is the output file a sam file? How do I compress the reads I've selected back into a bam file so I can run bedtools on this set of reads?