Question: How to retrieve the SNP data from Bam file
0
gravatar for cdc421
6 months ago by
cdc4210
cdc4210 wrote:

I'm analyzing the exome sequencing data and plan to retrieve the nucleotide counts of specific positions from Bam files. I am a beginner of bioinformatics. My friends said pysamstats is a good software. If you have a information, would you please tell me what software is best and how can I do that?

I wanna get the nucleotide frequency of 5000 SNPs.

.

chr1 2000454 A:5 T:546 C: 2

chr2 230404 A:1 T:2 C:343

chr2

chr3

/

/

/

chrx 3234343 A:323 T:1

ADD COMMENTlink modified 6 months ago • written 6 months ago by cdc4210
1
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

I wrote http://lindenb.github.io/jvarkit/FindAllCoverageAtPosition.html

usage:

echo "my.bam" | java -jar dist/findallcoverageatposition.jar --posfile positions.bed

example of output:

#File              CHROM      POS  SAMPLE  DEPTH  M    I  D  N  S   H  P  EQ  X  Base(A)  Base(C)  Base(G)  Base(T)  Base(N)  Base(^)  Base(-)
./testdata/S4.bam  rotavirus  100  S4      126    126  0  0  0  29  0  0  0   0  5        0        0        121      0        0        0
./testdata/S1.bam  rotavirus  100  S1      317    317  1  0  0  50  0  0  0   0  27       0        1        289      0        1        0
./testdata/S2.bam  rotavirus  100  S2      311    311  0  1  0  60  0  0  0   0  29       1        0        281      0        0        1
./testdata/S3.bam  rotavirus  100  S3      446    446  1  0  0  86  0  0  0   0  39       0        1        406      0        1        0
ADD COMMENTlink written 6 months ago by Pierre Lindenbaum128k

Thanks. You recommend the Jvarkit, not pysamstats. Is it correct? I'm a true beginner .

ADD REPLYlink written 6 months ago by cdc4210

yes, but I'm sure pysamstats works too (I don't know this software)

ADD REPLYlink written 6 months ago by Pierre Lindenbaum128k

Would you please tell me how can I retrieve the data of 1000 SNPs from 5 different samples? Your command is for one SNP.

ADD REPLYlink written 6 months ago by cdc4210

I made bed files, but they didn't work. What format is necessary for the analysis.

I always see the following results, I don't know why..

find ./lib/testdata/ -type f -name "*.bam" | \ java -jar dist/findallcoverageatposition.jar --posfile positions.bed

File exists but is not readable

ADD REPLYlink written 6 months ago by cdc4210
0
gravatar for rizoic
6 months ago by
rizoic230
rizoic230 wrote:

If you have your positions of interest in bed format you could look into bam-readcount.

You can run it simply as bam-readcount -l regions.bed -f ref.fa file.bam and get frequency values for all the nucleotides along with a bunch of additional info in case required.

ADD COMMENTlink written 6 months ago by rizoic230

You recommend bam-readcount. Would you please tell me how can I retrieve the data of 1000 SNPs from 5 different samples?

ADD REPLYlink written 6 months ago by cdc4210
0
gravatar for cdc421
6 months ago by
cdc4210
cdc4210 wrote:

You recommend bam-readcount. Would you please tell me how can I retrieve the data of 1000 SNPs from 5 different samples?

ADD COMMENTlink written 6 months ago by cdc4210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1704 users visited in the last hour