Extracting a trimmed output from a bam file
3
3
Entering edit mode
10.1 years ago
Will 4.5k

I'm tying to extract a specific region of a bam-file into a fasta-file (ultimately). All of the methods I've tried so far give me all reads that OVERLAP the desired region, I'm trying to find a way to trim those to only the desired region.

I've tried:

samtools view

samtools view compiled.sorted.bam ConB:2185-2195

intersectBed

intersectBed -b test.bed -abam compiled.sorted.bam -ubam > out.bam

but these will give the entire read that overlaps my desired region, I'm trying to get something that will trim everything to sam/bam file where the 'reads' are 10 nucleotides long. Am I just missing a flag somewhere to limit the returned region?

bam alignment • 6.4k views
ADD COMMENT
0
Entering edit mode

So if read spans the boundaries, you want to retrieve just that part of the read that is inside that region?

ADD REPLY
0
Entering edit mode

Correct. I'd prefer to exclude things that are only partially inside the region ... although I can parse that out in my downstream analysis.

ADD REPLY
0
Entering edit mode

There's nothing in samtools, at least, to trim reads at a given boundary, since that's not exactly a common need. I suspect you'll need to code this up yourself (I can foresee some annoyances there).

ADD REPLY
0
Entering edit mode

Yeah, that's what I'm seeing. I figured it would be a more common request, but I guess not ... python to the rescue!

ADD REPLY
1
Entering edit mode
10.1 years ago

I wrote SAM4WebLogo for Sequence Logo For Different Alleles Or Generated From Sam/Bam and I think it could do what you need

$ java -jar dist/sam4weblogo.jar -r seq1:80-110  sorted.bam  2> /dev/null | head -n 50
>B7_593:4:106:316:452/1
TGTTG--------------------------
>B7_593:4:106:316:452a/1
TGTTG--------------------------
>B7_593:4:106:316:452b/1
TGTTG--------------------------
>B7_589:8:113:968:19/2
TGGGG--------------------------
>B7_589:8:113:968:19a/2
TGGGG--------------------------
>B7_589:8:113:968:19b/2
TGGGG--------------------------
>EAS54_65:3:321:311:983/1
TGTGGG-------------------------
>EAS54_65:3:321:311:983a/1
TGTGGG-------------------------
>EAS54_65:3:321:311:983b/1
TGTGGG-------------------------
>B7_591:6:155:12:674/2
TGTGGGGG-----------------------
>B7_591:6:155:12:674a/2
TGTGGGGG-----------------------
>B7_591:6:155:12:674b/2
TGTGGGGG-----------------------
>EAS219_FC30151:7:51:1429:1043/2
TGTGGGGGGCGCCG-----------------
>EAS219_FC30151:7:51:1429:1043a/2
TGTGGGGGGCGCCG-----------------
>EAS219_FC30151:7:51:1429:1043b/2
TGTGGGGGGCGCCG-----------------
>B7_591:5:42:540:501/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410/1
TGGGGGGGGCGCAGT----------------
>B7_591:5:42:540:501a/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410a/1
TGGGGGGGGCGCAGT----------------
>B7_591:5:42:540:501b/1
TGTGGGGGCCGCAGTG---------------
>EAS192_3:5:223:142:410b/1
TGGGGGGGGCGCAGT----------------
ADD COMMENT

Login before adding your answer.

Traffic: 3679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6