Question: Samtools- Cutting Out A Region/ Getting Information About A Region
0
gravatar for diltsjeri
6.8 years ago by
diltsjeri440
Chicago, IL
diltsjeri440 wrote:

We have a reference sequence with 40Ns located in the middle of it. After aligning the reads to the reference with the 40Ns, what is the best way to view the contigs that over lap the 40Ns region? Where can I get information on how many reads and contigs are covering that region using a bam file? I have sorted and indexed the bam file, but I'm not sure where to go from here. If anybody has done something like this, your help would be appreciated.

Thanks.

samtools reference • 2.6k views
ADD COMMENTlink modified 6.8 years ago by Matt Shirley9.1k • written 6.8 years ago by diltsjeri440
2
gravatar for swbarnes2
6.8 years ago by
swbarnes26.7k
United States
swbarnes26.7k wrote:

I'm not sure what you mean by "contig" in this context.

You can always use samtools view to filter the .bam to just the desired region of the desired chromosome. But looking at that region with IGV is probably the simplest thing to do.

ADD COMMENTlink written 6.8 years ago by swbarnes26.7k

If you have access to samtools I would also suggest samtools tview as a very simple, fast viewer.

ADD REPLYlink written 6.8 years ago by Matt Shirley9.1k

Thanks for your responses.

Maybe I'm confused. I thought a bam file has a contigs and reads? I want to know which unique ones are covering the region of interest. I've used IGV and tview to see the alignment, but I'm interested in pipe-lining the data, so manually looking at it isn't exactly a solution we are looking for. I also have access to an sff, fasta, and fastq file.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by diltsjeri440

The SAM format (of which BAM is just block gzip compressed and the blocks are indexed against genomic coordinates) is really just a container for reads, quality scores, alignment information, and other optional flags and strings. There are no contiguous constructs held in this format. The header of a SAM format file does contain a sequence library (@SQ) which defines the contains that you have aligned your reads to.

ADD REPLYlink written 6.8 years ago by Matt Shirley9.1k

samtools view can whittle down your .bam file to just the reads that cover a particular region, or it can take a list of regions in bed format. Reads with high MQ should be unique, samtools view can also filter by MQ.

ADD REPLYlink written 6.8 years ago by swbarnes26.7k
1
gravatar for Matt Shirley
6.8 years ago by
Matt Shirley9.1k
Cambridge, MA
Matt Shirley9.1k wrote:

If you want to determine the coverage for a specific region of your aligned reads, take a look at the coverageBed tool in bedtools.

If you are interested in more specific information about the reads in your region of interest, take a look at Extract Reads From A Bam File That Fall Within A Given Region, which has many relevant answers.

ADD COMMENTlink written 6.8 years ago by Matt Shirley9.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1909 users visited in the last hour