Question: SNPs from bam file with no ref
0
gravatar for Joefromlondon
3.9 years ago by
European Union
Joefromlondon0 wrote:

Hi guys, I wonder if you guys can offer any advice!!

I am currently trying to call all SNPs from a set of 8 genomes in .bam file format from a single population. The genomes have been mapped and aligned, however I have no reference so cannot make an index from this.

I am using samtools, and have tried to create and index, resulting in .bai files from the bam files themselves. I have tried mpileup with with all 8 files but has taken over 2 hours so far to process. I did run the same for 1 of the files (took around 1 hour), which gave an incomprehensible .bcf file. Is it normal to take this amount of time? I am more than open to trying other tools should you recommend them. Thanks in advance for you help!

samtools • 997 views
ADD COMMENTlink modified 3.9 years ago by Len Trigg1.2k • written 3.9 years ago by Joefromlondon0
2

explain: "The genomes have been mapped" how can you map your reads if you don't have a reference ?

 

ADD REPLYlink written 3.9 years ago by Pierre Lindenbaum116k

It wasnt me that did it! Ill track down the culprit and retrieve the files. Then ill hopefully be able to get it to work

 

ADD REPLYlink written 3.9 years ago by Joefromlondon0

This question might sound stupid, are you working on human samples?

If you do 

samtools -H <input bam file>

What is does it show? Maybe that will tell you some information of what the genome is or what the reference fileis

ADD REPLYlink written 3.9 years ago by Sam2.2k
2
gravatar for swbarnes2
3.9 years ago by
swbarnes24.8k
United States
swbarnes24.8k wrote:
I have tried mpileup with with all 8 files but has taken over 2 hours so far to process.

Not surprising.  I do mpileup on exome data all the time, and it takes hours.  If you are doing the whole genome, that would take longer.

You can paralleleize things if you have multiple processors, by doing each chromosome separately.

I think trying this without a reference file is going to make things take even longer.  It's probably worth it to spend time to find the right reference.  Whoever did the alignments should have the reference.

I did run the same for 1 of the files (took around 1 hour), which gave an incomprehensible .bcf file.

Then tell bcftools to make a vcf instead of a binary version.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by swbarnes24.8k
1

I'll note that samtools mpileup can also directly produce VCF formatted results, though this isn't true of the older 0.1.X versions.

ADD REPLYlink written 3.9 years ago by Devon Ryan88k

Thanks! good to know the time  is normal. Unfortunatley as it was not me who mapped and aligned the genomes I dont personally have the reference, but it shouldnt be an issue for me to get it. As you can probably tell this is pretty new to me.. Ill have a further look into bcftools if not just use vcf format.

ADD REPLYlink written 3.9 years ago by Joefromlondon0
1
gravatar for Len Trigg
3.9 years ago by
Len Trigg1.2k
New Zealand
Len Trigg1.2k wrote:

Use something like samtools view -H to examine the header of the BAMs, and look at the @SQ entries to work out which reference was used during mapping (hopefully the same one was used for all your 8 samples). Then use the correct reference when you do the variant calling :-)

 

ADD COMMENTlink written 3.9 years ago by Len Trigg1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1765 users visited in the last hour