my question is quite similar to some other questions that have been asked here before. Unfortunately, none of them answer it all the way which is why I'll make another (this) post.
- What I have is a BAM file with ~100 million reads.
- What I need is to extract the reads that have information about a certain genomic position along with a tag of the reads.
The reason why I'm saying "read that has information about a certain genomic position" is that with
samtools view in.bam chrX:22222 I also extract all reads that only stretch along this position but don't actually overlap it. These reads are useless for me. Ideally, I'd like to only get the information in that position instead of the whole read.
Additionally, I need to carry over a barcode that is saved as a tag in the read to link this information together. It would also be fine to just keep all the tags and then parse it later.
Does anybody know of a way to do this? To me it looks like I gotta write my own little script to do this but I'd like to avoid reinventing the wheel if this already exists somewhere. Also
samtools mpileup is helping in that it uses only the informative reads but it returns only the nucleotides in that position and throws away the read tags.
Thanks a lot!