Question: variant calling for more than 60 samples using haplotype caller
0
gravatar for siyavash_damdar
7 months ago by
siyavash_damdar20 wrote:

Hi I want to do variant calling for more than 60 samples using haplotype caller. I want to know, I should do it with multiple samples using this command :

 java -Xmx16g -jar GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -R Equus_caballus.EquCab2.dna.toplevel.fa -T HaplotypeCaller  -I sample-1.bam -I sample-2  -I sample-3  ......-I sample-60 -ERC GVCF  -o output.vcf.gz

I want to know is it a true method?

snp next-gen • 373 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by siyavash_damdar20
1

What do you mean by "true method"? Also, please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.

code_formatting

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS19k

thank you for your attention. I mean can I do variant calling for 60 samples or I should do it separately for each sample? because when I do it separately for each sample, I have only two Genotypes per SNP (0/1 or 1/1) in g.vcf files.

ADD REPLYlink modified 7 months ago • written 7 months ago by siyavash_damdar20

Hi siyavash_damdar,

Please follow up on all your previous questions and provide feedback.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

Note that future questions might be closed if you do not provide feedback on older threads.

Cheers,
Wouter

ADD REPLYlink modified 7 months ago by RamRS19k • written 7 months ago by WouterDeCoster35k

Hi Thank you for useful comment. Yes of course, I did it. The best, Siavash

ADD REPLYlink written 7 months ago by siyavash_damdar20
2
gravatar for RamRS
7 months ago by
RamRS19k
Houston, TX
RamRS19k wrote:

You can call sample by sample and joint genotype - that way, you shouldn't lose cross-sample genotype information. Or, you could call all samples per region, where each region can be a chromosome or smaller so you reduce computational burden.

You'd just need different GATK tools to Combine/Cat Variants based on the division you choose.

You could even combine both if you'd like to. The choice is best made based on time and computational resources available to you.

ADD COMMENTlink written 7 months ago by RamRS19k

so you mean my command is not a true way for all samples variant calling?

java -Xmx16g -jar GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -R Equus_caballus.EquCab2.dna.toplevel.fa -T HaplotypeCaller  -I sample-1.bam -I sample-2  -I sample-3  ......-I sample-60 -ERC GVCF  -o output.vcf.gz
ADD REPLYlink modified 7 months ago • written 7 months ago by siyavash_damdar20

If you're looking for someone to give you the "right" command, sorry, I'm not that person. I've outlined possible approaches, and it is up to you to choose one and implement it.

What you're doing above is using neither of those approaches - you're calling all samples across all regions in a single thread with 16G RAM. That's fine as long as you're ready to wait a long time and risk everything on one thread.

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS19k

what are you thinking if I increase my memory an thread (RAM 64 and 16 thread)?

ADD REPLYlink written 7 months ago by siyavash_damdar20

what are you thinking

I think it will be faster. GATK has multiple levels of parallelism, multithreading is just one of them.

ADD REPLYlink written 7 months ago by RamRS19k

so, according to your comments, I can have 3 scenarios: 1. call sample by sample and joint genotype 2.call all samples per region 3. call all samples (all regions) using multithreading so, Are there any differences in the results (genotypes) of these scenarios?

ADD REPLYlink written 7 months ago by siyavash_damdar20

We're getting into a rabbit hole now, where the principal question was something else and the discussion devolves into an one-on-one between OP and an answerer. This is not good and I cannot encourage this.

If you're really curious about this, search for posts that address this question. Better yet, Google it. The better you are at Google-ing stuff, the faster you can solve your problems.

ADD REPLYlink written 7 months ago by RamRS19k

sorry if bother you. Thanks

ADD REPLYlink written 7 months ago by siyavash_damdar20

It's not a bother, it's not personal. It's just not something to be encouraged too much, as the discussion goes from being something useful to a lot of people to a niche conversation useful just to one person.

ADD REPLYlink written 7 months ago by RamRS19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1302 users visited in the last hour