Question: Filter Reads In The Sam/Bam Files By Direction Using Samtools
1
gravatar for Shaohua
7.6 years ago by
Shaohua20
Shaohua20 wrote:

Hi all,

I am wondering whether there is a way to filter the reads in the sam/bam files by the alignment direction using samtools or other tools? for example, I want to filter out all the reads facing to each other (-> <-) in the mapping result of a mate pair library.

Thanks a lot!

Shaohua

samtools • 5.1k views
ADD COMMENTlink modified 7.6 years ago by Pierre Lindenbaum134k • written 7.6 years ago by Shaohua20

you mean, face each other and they are pairs ? Is your bam file sorted by read name ?

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by Gabriel R.2.8k

yes, since in principle, the reads in the mate pair library should be <- ->. However, some paired end reads with direction of -> <- were sequenced. For my research, I just want to use the 'true mate pair reads'. I want to filter out those reads are PE reads based on their direction

ADD REPLYlink written 7.6 years ago by Shaohua20

have they been sorted on a per name basis ?

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

I am not sure what you mean here. I used bwa sampe for the mapping and the results from bwa were first converted by bam format and sorted by default parameters of samtools

ADD REPLYlink written 7.6 years ago by Shaohua20

Well you cannot rely on the flags, that we know. not if you do samtools sort on read names, not coordinate, you will have every pair with its mate immediately after it. Then you can figure out which are within reasonable distance plus opposite strands.

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

There's no need to resort for that. Just check the flag (for orientation), rnext (to ensure the same chromosome) and pnext (mapping position) fields. This would be trivial to script.

ADD REPLYlink written 7.6 years ago by Devon Ryan98k

How do you get the orientation of the mate ?

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

The 0x20 flag status.

ADD REPLYlink written 7.6 years ago by Devon Ryan98k

yes ! forgot that one !

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by Gabriel R.2.8k

@Shaohua can you script in pysam or code in bamtools ?

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

If the aligner that you used sets the "properly paired" flag correctly, then just filter by that "samtools -f 2 ...".

ADD REPLYlink written 7.6 years ago by Devon Ryan98k

that depends on the size of the insert in between. I do not know how an aligner will do insert estimate for a mate pair lib.

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

yes, it is quite tricky to estimate the insertion size, for some libraries, the insertion size estimated by bwa is around 3 kb which fits our expectation. However, for some libraries, the range of the insertion size is quite big, from few hundreds bp to 5 kb. This may indicate the mate pair library has lots of PE reads.

ADD REPLYlink written 7.6 years ago by Shaohua20
0
gravatar for Devon Ryan
7.6 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

The following program is an example of an efficient approach to use, using the samtools C API. The program itself is just a quick hack, but you can use it to get started. The program takes a SAM or BAM input and will write accepted reads to a file called "filtered.bam". Reads are accepted according to the following criterion: (1) a read and its mate must map to the same chromosome/contig, (2) the mates must have opposite orientations, and (3) the mates must point away from each other. I've added some comments, in case you're not very familiar with the API or C. You'd want to make this more robust if you were to actually use it!

#include "sam.h"

int main(int argc, char* argv[]) {
    samfile_t *ifile = NULL, *ofile = NULL;
    bam1_t *read = bam_init1();
    int keep = 0;
    char *p = NULL;

    //Open input file, either SAM or BAM
    p = strrchr(argv[1], '.');
    if(strcmp(p, ".bam") == 0) {
        ifile = samopen(argv[1], "rb", NULL);
    } else {
        ifile = samopen(argv[1], "r", NULL);
    }

    //Open output file
    ofile = samopen("filtered.bam", "wb", ifile->header);

    //Iterate through the lines
    while(samread(ifile, read) > 1) {
        keep = 0;
        //Is the read's mate on the same chromosome/contig?
        if(read->core.tid == read->core.mtid) {
            //Are the mates on opposite strands?
            if(read->core.flag & BAM_FREVERSE && !(read->core.flag & BAM_FMREVERSE)) {
                if(read->core.pos < read->core.mpos) keep=1;
            } else if(!(read->core.flag & BAM_FREVERSE) && read->core.flag & BAM_FMREVERSE) {
                if(read->core.mpos < read->core.pos) keep=1;
            }
        }
        if(keep) samwrite(ofile, read);
    }
    bam_destroy1(read);
    samclose(ifile);
    samclose(ofile);
    return 0;
}
ADD COMMENTlink written 7.6 years ago by Devon Ryan98k

Thank you for the script. I think this will serve my purpose to filter the reads from bam file. But I am not able to run the script. Can you tell me what are the command line arguments we need to pass to run this program.

thank you

ADD REPLYlink written 3.6 years ago by poojasethiya90

The only argument is a BAM file. It writes a new file called "filtered.bam".

ADD REPLYlink written 3.6 years ago by Devon Ryan98k
0
gravatar for Pierre Lindenbaum
7.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

using my tool for filtering a SAM/BAM with javascript: https://github.com/lindenb/jvarkit#samjs-filtering-a-sambam-file-with-javascript

 java -jar dist/samjs.jar \
   SE='(record.referenceIndex==record.mateReferenceIndex && record.referenceIndex>=0 && record.readNegativeStrandFlag!=record.mateNegativeStrandFlag &&  ((record.mateNegativeStrandFlag && record.alignmentStart < record.mateAlignmentStart ) || (record.readNegativeStrandFlag && record.mateAlignmentStart < record.alignmentStart ) ))' \
    I=input.bam \
    SAM_OUTPUT=true \
    VALIDATION_STRINGENCY=SILENT
ADD COMMENTlink written 7.6 years ago by Pierre Lindenbaum134k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1907 users visited in the last hour
_