SNPs from bam file with no ref
2
0
Entering edit mode
9.1 years ago

Hi guys, I wonder if you guys can offer any advice!!

I am currently trying to call all SNPs from a set of 8 genomes in .bam file format from a single population. The genomes have been mapped and aligned, however I have no reference so cannot make an index from this.

I am using samtools, and have tried to create and index, resulting in .bai files from the bam files themselves. I have tried mpileup with with all 8 files but has taken over 2 hours so far to process. I did run the same for 1 of the files (took around 1 hour), which gave an incomprehensible .bcf file. Is it normal to take this amount of time? I am more than open to trying other tools should you recommend them. Thanks in advance for you help!

samtools • 2.0k views
ADD COMMENT
2
Entering edit mode

Explain: "The genomes have been mapped" how can you map your reads if you don't have a reference?

ADD REPLY
0
Entering edit mode

It wasn't me that did it! I'll track down the culprit and retrieve the files. Then I'll hopefully be able to get it to work

ADD REPLY
0
Entering edit mode

This question might sound stupid, are you working on human samples?

If you do samtools -H <input bam file>, what is does it show? Maybe that will tell you some information of what the genome is or what the reference file is

ADD REPLY
2
Entering edit mode
9.1 years ago

I have tried mpileup with with all 8 files but has taken over 2 hours so far to process.

Not surprising. I do mpileup on exome data all the time, and it takes hours. If you are doing the whole genome, that would take longer.

You can parallelize things if you have multiple processors, by doing each chromosome separately.

I think trying this without a reference file is going to make things take even longer. It's probably worth it to spend time to find the right reference. Whoever did the alignments should have the reference.

I did run the same for 1 of the files (took around 1 hour), which gave an incomprehensible .bcf file.

Then tell bcftools to make a vcf instead of a binary version.

ADD COMMENT
1
Entering edit mode

I'll note that samtools mpileup can also directly produce VCF formatted results, though this isn't true of the older 0.1.X versions.

ADD REPLY
0
Entering edit mode

Thanks! good to know the time is normal. Unfortunately as it was not me who mapped and aligned the genomes I don't personally have the reference, but it shouldn't be an issue for me to get it. As you can probably tell this is pretty new to me.. Ill have a further look into bcftools if not just use vcf format.

ADD REPLY
1
Entering edit mode
9.1 years ago
Len Trigg ★ 1.6k

Use something like samtools view -H to examine the header of the BAMs, and look at the @SQ entries to work out which reference was used during mapping (hopefully the same one was used for all your 8 samples). Then use the correct reference when you do the variant calling :-)

ADD COMMENT

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6