I am using the 10x possorted_bam.bam file to map edits in a gene of intrest. I isolated the SAM reads using samtools view possorted_bam.bam | grep $gene_name > out.sam
. I then have been parsing the SAM file using a custom python script. When I go to The IGV coverage plot and hover over one the base with the most insertions I see 52 insertions in that locus, in my script I only see 32 insertions.
I also know the following:
- There are 76 cigar strings in out.sam that contain an insertion, and all strings reference only one insertion
- The dataframe I made contains 76 insertions
- The insertions cluster in the same loci in IGV and the dataframe
I am suspecting that samtools view
is dropping reads that are not of high enough confidence, while the IGV coverage plot is including those reads (I remember there being a flag for that in BAM.)
Any advice is appreciated, I can post code if reviewing it would be helpful.
You may want to do the conversion using a tool that 10x makes available for conversion of the reads back to fastq (bam2fastq) rather than using plain
samtools view
.