How to use scanbamparam and CIGAR from Rsamtools to count specific reads?
1
0
Entering edit mode
3.1 years ago
MAPK ★ 1.7k

Hi, I have used default bowtie option which allows for three mismatches to get bam file aligned with my reference sequence. I want to know how many reads are aligned with terminal mismatches (i.e,last three, last two and last one base mismatched at 3' region of the aligned reads at their loci position). Additionally, I want to account for both positive and negative strands with terminal mismatches. How can I get this information using the Rsamtools? How do I use CIGAR andwhat CIGAR string I should use for this purpose? Thanks for your help in advance.

here is my code:

library(Rsamtools)
library(GenomicRanges)
library(GenomicAlignments)

bam.file <-"aligned.bam"
params <-
  ScanBamParam(
    which = which,
    flag = scanBamFlag(
      isUnmappedQuery = FALSE,
      isDuplicate = FALSE,
      isNotPassingQualityControls = FALSE
    ),
    simpleCigar = FALSE,
    reverseComplement = FALSE,
    what = c(
      "qname",
      "flag",
      "rname",
      "seq",
      "strand",
      "pos",
      "qwidth",
      "cigar",
      "qual",
      "mapq"
    )
  )
r rsamtools • 1.5k views
ADD COMMENT
0
Entering edit mode

are aligned with terminal mismatches

that is, soft or hard clip isn't it ?

ADD REPLY
0
Entering edit mode

I believe is soft clip.

ADD REPLY
0
Entering edit mode
3.1 years ago

using samjdk http://lindenb.github.io/jvarkit/SamJdk.html the following command would return any sam record having a cigar string starting or ending with a cigar operator hard or soft clip having a length < 4

$ java -jar dist/samjdk.jar -e 'if(record.getReadUnmappedFlag()) return null; final Cigar cigar=record.getCigar(); for(int side=0;side<2;++side) {final CigarElement ce=(side==0?cigar.getFirstCigarElement():cigar.getLastCigarElement()); if(ce.getOperator().isClipping() && ce.getLength()<4) return true;}  return null;' src/test/resources/toy.bam



@HD VN:1.5  SO:coordinate
@SQ SN:ref  LN:45
@SQ SN:ref2 LN:40
@RG ID:gid1 LB:S1   PL:illumina SM:S1   PU:run1 CN:Nantes   DS:S1
r002    0   ref 9   30  1S2I6M1P1I1P1I4M2I  *   0   0   AAAAGATAAGGGATAAA   *   RG:Z:gid1
ADD COMMENT

Login before adding your answer.

Traffic: 1042 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6