Parsing SAM output and comparison to IGV

0

Entering edit mode

4.2 years ago

cf556 ▴ 10

I am using the 10x possorted_bam.bam file to map edits in a gene of intrest. I isolated the SAM reads using samtools view possorted_bam.bam | grep $gene_name > out.sam. I then have been parsing the SAM file using a custom python script. When I go to The IGV coverage plot and hover over one the base with the most insertions I see 52 insertions in that locus, in my script I only see 32 insertions.

I also know the following:

There are 76 cigar strings in out.sam that contain an insertion, and all strings reference only one insertion
The dataframe I made contains 76 insertions
The insertions cluster in the same loci in IGV and the dataframe

I am suspecting that samtools view is dropping reads that are not of high enough confidence, while the IGV coverage plot is including those reads (I remember there being a flag for that in BAM.)

Any advice is appreciated, I can post code if reviewing it would be helpful.

RNA-Seq SAM • 675 views

ADD COMMENT • link 4.2 years ago by cf556 ▴ 10

0

Entering edit mode

You may want to do the conversion using a tool that 10x makes available for conversion of the reads back to fastq (bam2fastq) rather than using plain samtools view.

ADD REPLY • link 4.2 years ago by GenoMax 141k

Login before adding your answer.