Question: Extract Reads From A Bam File That Fall Within A Given Region
15
gravatar for abi
6.1 years ago by
abi220
abi220 wrote:

Hi all

This question could be considered as a follow up of this discussion. http://www.biostars.org/post/show/3407/how-to-extract-reads-from-bam-that-overlap-with-specific-regions/#3414 What I need is to extract reads from bam file that fall only within a given region (not overlap the given region), the region being in the form of a gff file or bed file. Overlapping reads could be extracted by several methods (as in the discussion mentioned or BEDTools). The idea is to try to be pretty sure of excluding 5´ UTRs in the process of detecting intergenic transcripts. I saw a tool in BamUtil (http://genome.sph.umich.edu/wiki/BamUtil) called "writeRegion" which would pretty much do what I want. Somehow could not get this running for my dataset. Was wondering if you guys might have an "R" or some other solution for this. Thanks in advance Abi

bam • 42k views
ADD COMMENTlink modified 12 weeks ago by Alex Reynolds25k • written 6.1 years ago by abi220
2

I think the options have changed since this reply was written if you want the default headerless sam file then use the command as per Damian Kao

samtools view input.bam "Chr10:18000-45500" > output.sam

if it's a bam output that you want you will need a '-b'

samtools view -b input.bam "Chr10:18000-45500" > output.bam
ADD REPLYlink written 13 months ago by KevinL20

Do not forget to include the -h flag whilst doing this, i.e., if you want to retain the SAM header, which will be required for many downstream applications.

-h include header in SAM output

ADD REPLYlink written 12 weeks ago by Kevin Blighe26k

Headers are automatically included when the output format is BAM since BAM requires the header to be included.

ADD REPLYlink written 12 weeks ago by d-cameron1.9k

Thanks for that 'heads' up! :)

ADD REPLYlink written 12 weeks ago by Kevin Blighe26k
16
gravatar for Damian Kao
6.1 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You can extract mappings of a sam/bam file by reference and region with samtools. For example:

samtools view input.bam "Chr10:18000-45500" > output.bam

That would output all reads in Chr10 between 18000-45500 bp.

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Damian Kao15k

Thanks Dk for your answer, in your example the output will be all reads starting anywhere between 18000-45500. What I want is to get out only the reads which also end in this region. For example starting at 18000 or higher and ending at 45500 or lower. Abi

ADD REPLYlink written 6.1 years ago by abi220

That is confusing. Would you please explain little bit more, which reads do you want in output?

ADD REPLYlink written 6.1 years ago by Vikas Bansal2.3k

I am sorry if I am confusing you guys. Lets say I have a 4 reads with the following start and end coordinates aligned on to the chromosome.

start coordinates of the reads: 900, 1000, 1100, 1900

end coordinates of the reads: 1200, 1500,1700, 2300

My region of interest is 1000:2000

I want my output to be in this case the second and third reads starting and ending within my region of interest.

The first and last reads should be excluded because even though they overlap my region of interest, they extend away from my region (1000:2000). I hope my question is clearer now. Abi

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by abi220

do you have spliced reads as well?

ADD REPLYlink written 6.1 years ago by Arun2.3k

he wants the read with overlaps "any" at the left-end and "within" at the right end. If you consider the position 18000-45000, he needs all reads with overlap of at least 1 base to 18000, but all reads should end less than or equal to 45000.

ADD REPLYlink written 6.1 years ago by Arun2.3k

Yep Arun thats what I want and BTW, I do not have spliced reads!

ADD REPLYlink written 6.1 years ago by abi220
2

abi, this should be possible with R and IRanges. For loading a bam file, refer to this post. If you have questions about using IRanges, let me know. I'll post an example.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Arun2.3k

hi kao , can u pls tell me how to index this bam file because wen I tried to make index it showed error like: Bam header is absent. Also can u suggest any tool r the way to merge the overlapping region in bam file and make into single fasta file.

ADD REPLYlink written 2.3 years ago by Bioblazer40

Use "index" from samtools, described here.

samtools index test.bam
ADD REPLYlink written 2.3 years ago by Tom Koch110

How can you randomly extract reads from two bam files to simulate sample contaminations? I am trying to simulate sample contamination for different level of dilution for NGS samples. Suppose I have two bam files for SampleA and SampleB. I want to generate 5 contaminated samples at dilution of 10%, 20%, 30%,40% and 50% of those two samples. I understand that I should extract reads from one of the two bam files at the given dilution percentage and reassign to the other bam file, but I don't know exactly how to do this. Could you please explain me the procedure? Thanks

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by MAPK1.3k
5
gravatar for abi
6.1 years ago by
abi220
abi220 wrote:

Thanks guys, Actually the answer was there in Michael´s explanation in http://www.biostars.org/post/show/3407/how-to-extract-reads-from-bam-that-overlap-with-specific-regions/#3414 In his example, countOverlaps(genes, reads, type = "within") would do what I want. If type is within, the query interval must be wholly contained within the subject interval.

Abi

ADD COMMENTlink written 6.1 years ago by abi220

yes, that's right. This is what I thought you'd want to go and look up and come back with questions. Seems the usage is already shown here. Nice!

ADD REPLYlink written 6.1 years ago by Arun2.3k
1
gravatar for Alex Reynolds
12 weeks ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

Via BEDOPS bedmap --fraction-map 1 to return only reads which fall entirely within intervals in intervals.bed:

$ bedmap --echo --fraction-map 1 <(bam2bed < reads.bam) intervals.bed > answer.bed

If you want to do some ad-hoc search:

$ bedmap --echo --fraction-map 1 <(bam2bed < reads.bam) <(echo -e "chrZ\t1234\t5678") > answer.bed
ADD COMMENTlink written 12 weeks ago by Alex Reynolds25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 705 users visited in the last hour