Mpileup File From Samtools
1
1
Entering edit mode
10.2 years ago

Hello,

I have a bam file with 8 samples. I am trying to use this file to call SNP's with some threshold values for minimum coverage, base quality, allele frequency etc. I found Varscan tool will let me call SNP's using mpileup file from samtools. http://varscan.sourceforge.net/using-varscan.html#v2.3_mpileup2snp I generated my mpileup file using the command - samtools mpileup -f [reference sequence] input_bam_file >myData.mpileup

I am using hg19 as reference. In my mpileup file, all my reference bases(third column) showed up as 'N' and the file only has data for chr1(column 1). Also the 4th column(reads covering the site) is 1/0 for the entire file.

chr1 16451 N 0

chr1 16452 N 1 A /

chr1 16453 N 1 A 1

chr1 16454 N 1 A .

chr1 16455 N 1 C 4

chr1 16456 N 0

chr1 16457 N 1 A /

chr1 16458 N 1 C 8

chr1 16459 N 1 T 3

Any thoughts as to why mpileup file looks like that?

Thank you! Tejaswi

mpileup varscan • 7.9k views
ADD COMMENT
0
Entering edit mode

your 8 samples have been mapped on the very same reference ?

ADD REPLY
0
Entering edit mode

Yes, using standard hg19

ADD REPLY
0
Entering edit mode

Check whether samtools view -H <your bam file> returns exactly the same names for chromosomes as grep ">" <your reference fasta>, had that happen to me a few times...

ADD REPLY
0
Entering edit mode

Hey Philipp,

Thanks for responding. Yeah I checked for the chromosome names and this is how it looks -

BAM file - @HD VN:1.0 GO:none SO:coordinate

@SQ SN:chr1 LN:249250621

@SQ SN:chr2 LN:243199373

@SQ SN:chr3 LN:198022430

@SQ SN:chr4 LN:191154276

@SQ SN:chr5 LN:180915260

@SQ SN:chr6 LN:171115067

@SQ SN:chr7 LN:159138663

@SQ SN:chr8 LN:146364022

@SQ SN:chr9 LN:141213431

@SQ SN:chr10 LN:135534747

@SQ SN:chr11 LN:135006516

@SQ SN:chr12 LN:133851895

@SQ SN:chr13 LN:115169878

@SQ SN:chr14 LN:107349540

@SQ SN:chr15 LN:102531392

@SQ SN:chr16 LN:90354753

@SQ SN:chr17 LN:81195210

@SQ SN:chr18 LN:78077248

@SQ SN:chr19 LN:59128983

@SQ SN:chr20 LN:63025520

@SQ SN:chr21 LN:48129895

@SQ SN:chr22 LN:51304566

@SQ SN:chrX LN:155270560

@SQ SN:chrY LN:59373566

@SQ SN:chrM LN:16571

Ref.fa file -

gi|224384768|gb|CM000663.1| Homo sapiens chromosome 1, GRC primary reference assembly

gi|224384767|gb|CM000664.1| Homo sapiens chromosome 2, GRC primary reference assembly

gi|224384766|gb|CM000665.1| Homo sapiens chromosome 3, GRC primary reference assembly

gi|224384765|gb|CM000666.1| Homo sapiens chromosome 4, GRC primary reference assembly

gi|224384764|gb|CM000667.1| Homo sapiens chromosome 5, GRC primary reference assembly

gi|224384763|gb|CM000668.1| Homo sapiens chromosome 6, GRC primary reference assembly

gi|224384762|gb|CM000669.1| Homo sapiens chromosome 7, GRC primary reference assembly

gi|224384761|gb|CM000670.1| Homo sapiens chromosome 8, GRC primary reference assembly

gi|224384760|gb|CM000671.1| Homo sapiens chromosome 9, GRC primary reference assembly

gi|224384759|gb|CM000672.1| Homo sapiens chromosome 10, GRC primary reference assembly

gi|224384758|gb|CM000673.1| Homo sapiens chromosome 11, GRC primary reference assembly

gi|224384757|gb|CM000674.1| Homo sapiens chromosome 12, GRC primary reference assembly

gi|224384756|gb|CM000675.1| Homo sapiens chromosome 13, GRC primary reference assembly

gi|224384755|gb|CM000676.1| Homo sapiens chromosome 14, GRC primary reference assembly

gi|224384754|gb|CM000677.1| Homo sapiens chromosome 15, GRC primary reference assembly

gi|224384753|gb|CM000678.1| Homo sapiens chromosome 16, GRC primary reference assembly

gi|224384752|gb|CM000679.1| Homo sapiens chromosome 17, GRC primary reference assembly

gi|224384751|gb|CM000680.1| Homo sapiens chromosome 18, GRC primary reference assembly

gi|224384750|gb|CM000681.1| Homo sapiens chromosome 19, GRC primary reference assembly

gi|224384749|gb|CM000682.1| Homo sapiens chromosome 20, GRC primary reference assembly

gi|224384748|gb|CM000683.1| Homo sapiens chromosome 21, GRC primary reference assembly

gi|224384747|gb|CM000684.1| Homo sapiens chromosome 22, GRC primary reference assembly

gi|224384746|gb|CM000685.1| Homo sapiens chromosome X, GRC primary reference assembly

gi|224384745|gb|CM000686.1| Homo sapiens chromosome Y, GRC primary reference assembly

Should I change the chromosome names in my ref.fa file as >chr1, >chr2 etc.?

I just noticed that my bam file has chrM(mitochondrial) too. Maybe I should concatenate the fasta file of chrM to my reference file as well?

Thank you for your help!

ADD REPLY
0
Entering edit mode

I think there's your problem - gi|224384768|gb|CM000663.1| Homo sapiens chromosome 1, GRC primary reference assembly should be chr1 etc.. mpileup looks for chr1, can't find it, and just inserts Ns. And yes, concatenate your mitochondrial reference as well!

ADD REPLY
0
Entering edit mode

So, it wasn't the very same reference ....

ADD REPLY
0
Entering edit mode

Thank you Philipp! That worked.

I din't realize the chromosome names were different. Thanks for your help Pierre! :)

ADD REPLY
1
Entering edit mode
10.2 years ago

You didn't use the same reference for mapping. The names of the chromosomes are not the same.

ADD COMMENT

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6