Parsing SAM output and comparison to IGV
0
0
Entering edit mode
4.2 years ago
cf556 ▴ 10

I am using the 10x possorted_bam.bam file to map edits in a gene of intrest. I isolated the SAM reads using samtools view possorted_bam.bam | grep $gene_name > out.sam. I then have been parsing the SAM file using a custom python script. When I go to The IGV coverage plot and hover over one the base with the most insertions I see 52 insertions in that locus, in my script I only see 32 insertions.

I also know the following:

  • There are 76 cigar strings in out.sam that contain an insertion, and all strings reference only one insertion
  • The dataframe I made contains 76 insertions
  • The insertions cluster in the same loci in IGV and the dataframe

I am suspecting that samtools view is dropping reads that are not of high enough confidence, while the IGV coverage plot is including those reads (I remember there being a flag for that in BAM.)

Any advice is appreciated, I can post code if reviewing it would be helpful.

RNA-Seq SAM • 675 views
ADD COMMENT
0
Entering edit mode

You may want to do the conversion using a tool that 10x makes available for conversion of the reads back to fastq (bam2fastq) rather than using plain samtools view.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6