Question: Extracting a trimmed output from a bam file
2
gravatar for Will
4.7 years ago by
Will4.5k
United States
Will4.5k wrote:

I'm tying to extract a specific region of a bam-file into a fasta-file (ultimately). All of the methods I've tried so far give me all reads that OVERLAP the desired region, I'm trying to find a way to trim those to only the desired region.

I've tried:

samtools view

samtools view compiled.sorted.bam ConB:2185-2195

intersectBed

intersectBed -b test.bed -abam compiled.sorted.bam -ubam > out.bam

but these will give the entire read that overlaps my desired region, I'm trying to get something that will trim everything to sam/bam file where the 'reads' are 10 nucleotides long. Am I just missing a flag somewhere to limit the returned region?

bam alignment • 3.0k views
ADD COMMENTlink modified 4.5 years ago by Tark50 • written 4.7 years ago by Will4.5k

So if read spans the boundaries, you want to retrieve just that part of the read that is inside that region?

ADD REPLYlink written 4.7 years ago by Biomonika (Noolean)3.0k

Correct. I'd prefer to exclude things that are only partially inside the region ... although I can parse that out in my downstream analysis.

ADD REPLYlink written 4.7 years ago by Will4.5k

There's nothing in samtools, at least, to trim reads at a given boundary, since that's not exactly a common need. I suspect you'll need to code this up yourself (I can foresee some annoyances there).

ADD REPLYlink written 4.7 years ago by Devon Ryan88k

Yeah, that's what I'm seeing. I figured it would be a more common request, but I guess not ... python to the rescue!

ADD REPLYlink written 4.7 years ago by Will4.5k
1
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

I wrote SAM4WebLogo ( https://github.com/lindenb/jvarkit/wiki/SAM4WebLogo ) for Sequence Logo For Different Alleles Or Generated From Sam/Bam and I think it could do what you need

$ java -jar dist/sam4weblogo.jar -r seq1:80-110  sorted.bam  2> /dev/null | head -n 50
>B7_593:4:106:316:452/1
TGTTG--------------------------
>B7_593:4:106:316:452a/1
TGTTG--------------------------
>B7_593:4:106:316:452b/1
TGTTG--------------------------
>B7_589:8:113:968:19/2
TGGGG--------------------------
>B7_589:8:113:968:19a/2
TGGGG--------------------------
>B7_589:8:113:968:19b/2
TGGGG--------------------------
>EAS54_65:3:321:311:983/1
TGTGGG-------------------------
>EAS54_65:3:321:311:983a/1
TGTGGG-------------------------
>EAS54_65:3:321:311:983b/1
TGTGGG-------------------------
>B7_591:6:155:12:674/2
TGTGGGGG-----------------------
>B7_591:6:155:12:674a/2
TGTGGGGG-----------------------
>B7_591:6:155:12:674b/2
TGTGGGGG-----------------------
>EAS219_FC30151:7:51:1429:1043/2
TGTGGGGGGCGCCG-----------------
>EAS219_FC30151:7:51:1429:1043a/2
TGTGGGGGGCGCCG-----------------
>EAS219_FC30151:7:51:1429:1043b/2
TGTGGGGGGCGCCG-----------------
>B7_591:5:42:540:501/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410/1
TGGGGGGGGCGCAGT----------------
>B7_591:5:42:540:501a/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410a/1
TGGGGGGGGCGCAGT----------------
>B7_591:5:42:540:501b/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410b/1
TGGGGGGGGCGCAGT----------------

 

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour