How to retrieve the SNP data from Bam file
3
0
Entering edit mode
5.0 years ago
cdc421 ▴ 10

I'm analyzing the exome sequencing data and plan to retrieve the nucleotide counts of specific positions from Bam files. I am a beginner of bioinformatics. My friends said pysamstats is a good software. If you have a information, would you please tell me what software is best and how can I do that?

I wanna get the nucleotide frequency of 5000 SNPs.

.

chr1 2000454 A:5 T:546 C: 2

chr2 230404 A:1 T:2 C:343

chr2

chr3

/

/

/

chrx 3234343 A:323 T:1

next-gen software error sequencing • 2.6k views
ADD COMMENT
1
Entering edit mode
5.0 years ago

I wrote http://lindenb.github.io/jvarkit/FindAllCoverageAtPosition.html

usage:

echo "my.bam" | java -jar dist/findallcoverageatposition.jar --posfile positions.bed

example of output:

#File              CHROM      POS  SAMPLE  DEPTH  M    I  D  N  S   H  P  EQ  X  Base(A)  Base(C)  Base(G)  Base(T)  Base(N)  Base(^)  Base(-)
./testdata/S4.bam  rotavirus  100  S4      126    126  0  0  0  29  0  0  0   0  5        0        0        121      0        0        0
./testdata/S1.bam  rotavirus  100  S1      317    317  1  0  0  50  0  0  0   0  27       0        1        289      0        1        0
./testdata/S2.bam  rotavirus  100  S2      311    311  0  1  0  60  0  0  0   0  29       1        0        281      0        0        1
./testdata/S3.bam  rotavirus  100  S3      446    446  1  0  0  86  0  0  0   0  39       0        1        406      0        1        0
ADD COMMENT
1
Entering edit mode

I made bed files, but they didn't work. What format is necessary for the analysis.

I always see the following results, I don't know why..

find ./lib/testdata/ -type f -name "*.bam" | \ java -jar dist/findallcoverageatposition.jar --posfile positions.bed

File exists but is not readable

ADD REPLY
0
Entering edit mode

Thanks. You recommend the Jvarkit, not pysamstats. Is it correct? I'm a true beginner .

ADD REPLY
0
Entering edit mode

yes, but I'm sure pysamstats works too (I don't know this software)

ADD REPLY
0
Entering edit mode

Would you please tell me how can I retrieve the data of 1000 SNPs from 5 different samples? Your command is for one SNP.

ADD REPLY
0
Entering edit mode

Hey Pierre, what's the format of this positions.bed? I have tried Chr\tStart\tEnd and is not working :(

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
5.0 years ago
rizoic ▴ 250

If you have your positions of interest in bed format you could look into bam-readcount.

You can run it simply as bam-readcount -l regions.bed -f ref.fa file.bam and get frequency values for all the nucleotides along with a bunch of additional info in case required.

ADD COMMENT
0
Entering edit mode

You recommend bam-readcount. Would you please tell me how can I retrieve the data of 1000 SNPs from 5 different samples?

ADD REPLY
0
Entering edit mode
5.0 years ago
cdc421 ▴ 10

You recommend bam-readcount. Would you please tell me how can I retrieve the data of 1000 SNPs from 5 different samples?

ADD COMMENT

Login before adding your answer.

Traffic: 1171 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6