Count The Number Of A, T, G And C At Specific Genomic Locations For Multiple Rnaseq Samples
2
2
Entering edit mode
10.2 years ago
komal.rathi ★ 4.1k

Hi everyone,

I was wondering whether there is any tool to perform a count of how many As, Ts, Gs & Cs are present at a particular genomic position. I have 64 RNASeq samples and a set of genomic locations for which I would like to count the occurrence of each base at that position.

Thanks

rnaseq rna-seq • 3.9k views
ADD COMMENT
2
Entering edit mode
10.2 years ago

I don't know of any tool. But I perform similar analysis where I

1) Align the data against reference genome

2) Use the bam file to create mpileup file. Use the match, mismatch information from mpileup (See here:http://samtools.sourceforge.net/pileup.shtml) to calculate the frequency of different nucleotides.

ADD COMMENT
1
Entering edit mode
10.2 years ago
Fred ▴ 780

I once found this tool: pileup2base that outputs the number of A C G T and INDELs for each positions of a mpileup file. Maybe you can test it.

If you want to generate the mpileup for a particulat position you can do:

samtools view -h indexed_bam_file.bam chr:start-end | samtools view -Shb - | samtools mpileup - > mpileup_file

Then you can select the line of the mpileup file that corresponds to the exact coordinate.

ADD COMMENT
0
Entering edit mode

Thanks @Fred, it took me so long to accept your answer because I was testing it on one sample, and it works!

ADD REPLY
0
Entering edit mode

Fred I tested the perl script on one of my BAM files and compared the output with my BAM file loaded in IGV. And it does not give the correct results. Just thought I would let you know.

ADD REPLY

Login before adding your answer.

Traffic: 2587 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6