How can I get a list of reads(read IDs) from bam files and also list of regions where all the reads are mapped on the reference genome.
2
1
Entering edit mode
6.2 years ago
SV ▴ 20

Hi everyone,

I am a newbie to the NGS studies. I have PE data (fastq files), and after performing mapping with bowtie2 I got bam files. Then I extracted mapped reads using samtools command:

samtools view -b -F 4 file.bam > mapped.bam

So now I am having bam file of mapped reads. Are there any tools or programs available to extract information which I need?

I need two things in a single text file:

1. First Column: a list (a text file) of mapped reads(IDs) and
2. Second Column: a list of regions on the reference genome where the reads are mapped.

For example, "this" is the read or read id which is mapped to "this region" on the reference genome.

So how can I achieve the same? Any help is highly appreciated.

Thank you!

bowtie2 mapping • 10.0k views
0
Entering edit mode

Why do you want that, what's your end goal?

0
Entering edit mode

further I have to do statistical analysis to compare data from different samples.

1
Entering edit mode

That is probably the vaguest possible reply. What I'm getting at is that there's likely a much much more efficient way to go about things, but you would have to provide enough details about your goals for us to help you find out.

5
Entering edit mode
6.2 years ago

remove the '-b' option from samtools. It will produces a text/sam file rather than a binary/bam file. use cut to extract the columns read-name,read-chr,read-pos

0
Entering edit mode

I tried commands from bedtools and it worked. I got the results as I wanted. Thank you so much @Pierre

2
Entering edit mode
6.2 years ago
vakul.mohanty ▴ 260

Pysam (http://pysam.readthedocs.io/en/latest/api.html) lets you freely parse the BAM file directly and extract the information you need. Downside is you will have to know some python to use it.

0
Entering edit mode

Thank you @Vakul. This is also a useful source.