Question: How to calculate base count for each base in one spot in sam file
2
1
Entering edit mode
4.4 years ago
ncxhit • 0

Hi, I am wondering if it is possible to calculate the percentage each of the four nucleic acids holds in a certain spot in sam file. Here is what I mean using the statement of input and output.

Input : SAM file

Output: Spot 1 : A: 10% , G: 20%, C: 30%, T: 40%

My understanding for these question is to loop through the sam file and for each aligned read, find the nucleotide the questioned spot.

I am wondering if there is a better solution.

alignment • 969 views
ADD COMMENT
0
Entering edit mode

Instead of %, you can obtain read coverage for each position for each base, with igvtools from bam files. ncxhit

ADD REPLY
0
Entering edit mode

Thanks! I'll check that out

ADD REPLY
0
Entering edit mode

Fyi cpad0112, you can paste plain biostars links, both from posts or users without any @ and the title/username will be displayed automatically ;-)

ADD REPLY
0
Entering edit mode
4.4 years ago
bernatgel ★ 3.4k

If you have (or are willing to convert to) BAM files (I think it cannot work with SAM files) we usually use bam-readcount for this kind of data extraction. The output is a bit more complex (and informative) than what you are asking, but it's possible to parse it wit R or similar to get the table you need.

ADD COMMENT
0
Entering edit mode
4.4 years ago
ATpoint 82k

seqtk comp can give you base composition. From there you could easily get percentages with something like awk.

https://github.com/lh3/seqtk

ADD COMMENT

Login before adding your answer.

Traffic: 2317 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6