Question: Extract reads with variant at given position
2
gravatar for philip.jonsson
5.8 years ago by
United States
philip.jonsson30 wrote:

I want to be able to extract, from BAM files, the reads that, for a given position, contain the variant (and conversely but separately, the ones that contain the reference). I've played around with various tools and searched the tubes, without finding any convenient way to do this. It doesn't seem like too odd a task to perform, so maybe my google fu is just failing me here.

sequencing alignment • 4.6k views
ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 5.8 years ago by philip.jonsson30

Are you looking for all reads containing some variant at, say, chr1:1000 or instead all those that contain a variant in base 20 of their alignment? I assume the former, which is much quicker to do, but figured I'd ask.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

The former, if I understand you correctly. I have a specific variant and I want to examine the reads that support that variant and compare to the reads that instead show the reference allele at the same position.

ADD REPLYlink written 5.8 years ago by philip.jonsson30
4
gravatar for Alex Reynolds
5.8 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Another option, for querying any ad hoc range (chrN:X-Y):

$ echo -e "chrN\tX\tY" | bedops --element-of 1 <(bam2bed < reads.bam) - > overlapping_reads.bed

To query any given set of genomic ranges in variants.bed:

$ bedops --element-of 1 <(bam2bed < reads.bam) variants.bed > overlapping_reads.bed

The bam2bed tool will preserve all fields, which may be useful.

If variants are in VCF, then a second bash subshell to call vcf2bed can convert them to BED, e.g.:

$ bedops --element-of 1 <(bam2bed < reads.bam) <(vcf2bed < variants.vcf) > overlapping_reads.bed
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Alex Reynolds31k
2
gravatar for Pierre Lindenbaum
5.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

I would use my tool sam2tsv to extract the name of the reads having (or not) the mution https://github.com/lindenb/jvarkit/wiki/SAM2Tsv

 

ADD COMMENTlink written 5.8 years ago by Pierre Lindenbaum131k
2

Since this can take input via a pipe, it should be added that the simplest method would be to take the BAM file, sort and index it, and then:

samtools view -bu some_file.bam chr1:123 | java -jar dist/sam2tsv.jar -r ref.fa stdin > blah.txt
ADD REPLYlink modified 5.8 years ago by Pierre Lindenbaum131k • written 5.8 years ago by Devon Ryan97k
1

@Devon: added '-bu'

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum131k

Good catch!

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

Why is it better to use a BAM file as stdin?

ADD REPLYlink written 4.8 years ago by emd070
1
gravatar for sacha
4 months ago by
sacha2.0k
France
sacha2.0k wrote:

For a specific variant, you can use bamql

For instance, if you want to extract all reads from a bam file :

  • which support the alternate 'A' in snp : chr17:29827429 G>A

    bamql -f yourfile.bam 'chr(17) & nt_exact(29827429,A)' -o A.bam
    
  • which support the reference 'G' in snp : chr17:29827429 G>A

    bamql -f yourfile.bam 'chr(17) & nt_exact(29827429,G)' -o B.bam
    

I guess you can loop over variant file and do this for each variant. You can also filter with mate information. See documentation Here

enter image description here

ADD COMMENTlink written 4 months ago by sacha2.0k

man this! thank you very much! this is specifically what i needed :)

ADD REPLYlink written 4 weeks ago by morovatunc460

BAMQL is unfortunetely not very HPC friendly. I kinda found this maybe it helps.

https://github.com/broadinstitute/VariantBam

ADD REPLYlink written 4 weeks ago by morovatunc460
0
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

In the off chance that you really do need to have the full alignments in SAM format in separate files, then you can code something up in either python with pysam or C with HTSlib. In both cases you can use the pileup functions to just grab all alignments that overlap a given position (or set of them) and then iterate over the results. The pileup functions will tell you which base of each alignment overlaps a given position, so you can check what genotype it supports and treat it accordingly.

Edit: But really, if you think you want the full alignments in separate files then you should probably rethink what you're doing. The two answers posted before mine are vastly simpler and should suffice for most normal needs.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Devon Ryan97k
0
gravatar for philip.jonsson
5.8 years ago by
United States
philip.jonsson30 wrote:

Thanks for the input, guys. I've realized that bam-readcount ultimately is the most appropriate tool for my purposes.

ADD COMMENTlink written 5.8 years ago by philip.jonsson30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1193 users visited in the last hour