Extracting Intron-Exon Reads from bam files
2
2
Entering edit mode
6.7 years ago
praasu ▴ 40

I would like to extract intron-exon reads (unspliced reads) from bam file. I tried use the with following command command, bedtools intersect -s -F 1 -split -a input.bam -b junction.bed >Intron_out.bam

Where junction.bed consists of last bp intron and first bp exonic. It doesn't to seem to be working, because, I am also getting the split reads that I am not expecting.

I would really appreciate, if someone can help me with the same.

RNA-Seq • 5.1k views
ADD COMMENT
1
Entering edit mode
6.7 years ago

I would think that a multi-step process would work:

  1. Filter for reads overlapping an exon
  2. Filter those passing step 1 for those overlapping an intron
  3. Filter those above for more than 1 CIGAR operation (you could filter out spliced alignments instead, but there's not then a simple samtools view option...this is quick and easy and close enough).
ADD COMMENT
0
Entering edit mode

Thanks a lot for your answer !!

ADD REPLY
0
Entering edit mode

Hi Devon, Could you please tell me the best way to filtered the reads overlapping an exon. I was trying to bedtools intersect with (v) options. Is it correct in your opinion ? My Best Regards, Prasoon

ADD REPLY
0
Entering edit mode

You're not going to want the -v option to select for reads overlapping exons.

ADD REPLY
0
Entering edit mode

Sorry for the confusing language, I want the reads overlapping intron-exon. Therefore, My question about the step 1 that you have suggested. How can I filter reads overlapping exon?

ADD REPLY
0
Entering edit mode

bedtools intersect with a BED file of exons for step 1. bedtools intersect with a BED file of introns for step 2.

ADD REPLY
1
Entering edit mode
6.7 years ago

I'm not sure if I understand well, your problem: you want the cigar section ('M' or 'X' or '=') overlapping the BED file isn't it.

Here is a solution using http://lindenb.github.io/jvarkit/SamJdk.html

private List<Interval> intervals=null;
public Object apply(final SAMRecord record) {
    if(this.intervals==null)
        {
        /** first call: we load the BED file "junction.bed" */
        try
            {
            java.io.BufferedReader r=new java.io.BufferedReader(new java.io.FileReader("junction.bed"));
            this.intervals = r.lines().
                filter(S->!(S.isEmpty() || S.startsWith("track")|| S.startsWith("browser"))).
                map(S->{
                    final String tokens[]=S.split("[\t]");
                    return new Interval(tokens[0],1+Integer.parseInt(tokens[1]),Integer.parseInt(tokens[2]));
                    }).
                collect(Collectors.toList());
            r.close();
            }
        catch(Exception err)
            {
            throw  new RuntimeException(err);
            }
        }
    // ignore unmapped
    if(record.getReadUnmappedFlag()) return false;
    final Cigar cigar = record.getCigar();
    if(cigar==null || cigar.isEmpty()) return false;
    // loop over the cigar string
    int pos = record.getStart();
    for(final CigarElement ce: cigar)
        {
        final CigarOperator op= ce.getOperator();
        if(op.isAlignment()) // it's 'M' or 'X' or '='
            {
            // search overlapping our bed list
            for(final Interval r:this.intervals)
                {
                //found !
                if(r.getContig().equals(record.getContig()) &&
                    pos <=r.getStart() &&
                    (pos+ce.getLength()) >= r.getEnd())
                    {
                    // accept this sam record
                    return true;
                    }
                }
            }
        if(op.consumesReferenceBases())
            {
            pos+=ce.getLength();
            }
        }
    // we don't accept
    return false;
    }

usage:

java -jar  dist/samjdk.jar -f biostars292561.code --body input.sam
ADD COMMENT
0
Entering edit mode

Thanks a lot for your answer.I would like to extract the reads transverse of intron-exon.

ADD REPLY
0
Entering edit mode

so , my solution should work.

P.

ADD REPLY
0
Entering edit mode

Hi Pierre, Thanks a lot. this program seems to be quite slow. Can you please suggest me if I can do something to speed up ? My Best Regads, Prasoon

ADD REPLY
1
Entering edit mode

sure, you can reduce the input bam using samtools view and piping it into my tool. Something like:

samtools view -bu your.bam "chrom:1234-456" |java -jar  dist/samjdk.jar -f biostars292561.code --body

or

samtools view -bu -L intervals.bed your.bam  |java -jar  dist/samjdk.jar -f biostars292561.code --body
ADD REPLY

Login before adding your answer.

Traffic: 1266 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6