The Reference Genome Of The Reading In Bam File
1
0
Entering edit mode
7.9 years ago
shl198 ▴ 420

Dear all, I have a question about analyzing BAM files with R. In BAM file, it includes all the reads. Are the reads already mapped to the reference genome? Eg, I want to find how many reads in the BAM are mapped to a reference genome. I used the readBamGappedAlignments to get the reads. And it shows a number of GappedAlignments. Then I used makeTranscriptDbFromUCSC to get a reference and use countOverlaps to count the overlaps, it also gives a series numbers of mapped reads. The number I get using countOverlaps is smaller than using readBamGappedAlignment. Which of these two results should be the number of mapped reads?

r bam alignment • 5.3k views
ADD COMMENT
1
Entering edit mode
7.9 years ago

Are the reads already mapped to the reference genome?

Yes. Most of the reads should already be mapped. Generally aligners or mappers only output mapped reads in bam files. But there are a few aligners that also output the unaligned or unmapped reads but there is enough information in bam file so that you can tell if a read is aligned or not.

I am not a R expert but the solution to your problem is straightforward. Install samtools and try these commands:

1) samtools view -c input.bam will give counts of all the reads in the bam file.

2) samtools view -c -F 4 input.bam will give counts of all the mapped reads to reference genome.

3) samtools view -c -f 4 input.bam will give counts of all the unmapped reads.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. So in general case, if I get a BAM file, is it possible to know the reference genome from which the aligned reads got? I count overlaps of the reads to the Human genome hg19, the number is much smaller than the number of reads in BAM file. Can I say the reference genome of BAM file is not Human genome hg19?

ADD REPLY
1
Entering edit mode

I have never heard a situation like this before. I mean you should have some idea about the reference genome that was used for mapping. An easy would be to use this command:

samtools view -H input.bam

This will output the header. Look carefully and you should be able to see which reference genome was used. You can also count the chromosomes.

ADD REPLY
0
Entering edit mode

OK. Now I figure out how to do that. Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6