Extract base information from aligned reads in a sam file
1
0
Entering edit mode
2.4 years ago
yliueagle ▴ 290

I generated a sam (bam) file by aligning fastq to reference using BWA. Now I would like to retrieve the the basepair at a given chromosome position, for example, chr12:11010100:11010100, for all reads aligned to this region, how can I do that? Thanks! (My expected workflow would be: (1) extract reads from sam that cover the querying position; (2) get the basepair from these reads at this position)

sam basepair reads • 953 views
0
Entering edit mode

Look into using igvtools which is part of IGV. I assume you are looking to get counts of basepairs (alleles) at a particular location.

0
Entering edit mode

Not only counts, but the basepair at that position for each read, with read name included in the output. (Of course the counts can be generated from this table as a downstream step)

1
Entering edit mode

Due to the fact that NGS data is very 'messy' (désordonnée), every base position will have multiple bases. You may want to output the frequencies of these via: A: Calculate The Frequency Of Nucleotides At Each Position In An Mpileup File

0
Entering edit mode

Is this good enough? samtools view <your bam> chr:start-end > reads.sam

0
Entering edit mode

That will get the reads but not specific base at that position. Not sure what is the best way to do that.

0
Entering edit mode

Thank you. This do obtain the reads but I also need to retrieve the basepair information from reads.sam at chr:start-end

2
Entering edit mode
2.4 years ago
yliueagle ▴ 290

I found http://lindenb.github.io/jvarkit/Sam2Tsv.html can do the job. Thanks all!