Entering edit mode
11.3 years ago
camelbbs
▴
710
Assume I have RNASeq data and a list of chromosome loci. Can I use some tools to get an output like this:
chr 2032121 A:20 T:350 C:300 G:55
chr 12098311 A:190 T:0 C:20 G:760
I want to calculate in each loci, how many A/T/C/G in the RNASeq reads. I am using samtools mpileup -l FILE -f hg19.fa in1.bam. But seems that can only identify the base on the reference genome. Thanks for help.
Pileup format gives you information about variations, it is shown here, I think this discussion is also highly relevant How to get a summary for a special chromosome position in BAM file
I wrote a script to parse that info from mpileup output a while back: http://blog.nextgenetics.net/?e=56
However, I am not sure that it works 100% correctly. So I would spot check the output.
Thanks Damian. I have a question here: I see if I have 2000 positions, but mpileup will only calculate about 1200 positions, is there a parameter that I can figure out on the all positions. Or does that mean the left 800 positions are in the sequencing gaps?