Question: BCFTOOLS: One-liner to get individuals with more than one heterozygote call in a region
0
gravatar for omid.alavijeh
4 days ago by
omid.alavijeh40 wrote:

Hello!

I have a regular multi-sample VCF from which I am looking to find individuals that have >1 HET call in the same gene. The vcf is chunked so relatively small in size (with the gene of interest in the chunk)/

Does anyone have a one liner for such a task with bcftools or even R?

Many thanks

bcftools • 76 views
ADD COMMENTlink modified 4 days ago • written 4 days ago by omid.alavijeh40

multiple individuals merged into one across a single gene

VCF format does not work that way. How did you get to this from a real valid VCF?

ADD REPLYlink written 4 days ago by RamRS25k

Yup - its a multi sample vcf chunked by region of the genome.

ADD REPLYlink written 4 days ago by omid.alavijeh40

Yup

Yes to what, exactly?

it's a multi-sample VCF

That is not the same thing as saying "multiple individuals merged into one". Please be clear - is it a multi-sample VCF or some sort of mutated file with multiple GTs per sample?

ADD REPLYlink written 4 days ago by RamRS25k

Dear RamRS,

It is a multi-sample vcf. Whilst it may contain mutations I can assure you it has not been mutated.

ADD REPLYlink written 4 days ago by omid.alavijeh40
1

Thank you for clarifying that. Your initial statement "I have a vcf with multiple individuals merged into one across a single gene" is quite confusing as individuals are not "merged", they are merely part of the VCF, and I'm also assuming loci are not merged (the "across a single gene" part). What you have is a regular VCF from which you are looking to find individuals that have >1 HET call in the same gene.

ADD REPLYlink written 4 days ago by RamRS25k

Correct! Apologies if that was too vague. I have amended by question accordingly.

ADD REPLYlink written 4 days ago by omid.alavijeh40
1
gravatar for Pierre Lindenbaum
4 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

using http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html to create a BED tools + bcftools to get the variants. (you could get some false positives if an indel overlap a position )

bcftools view --targets-file <(java -jar dist/bioalcidaejdk.jar -e 'final Map<String,List<Interval>> sample2ctx = new HashMap<>();while(iter.hasNext()) { final VariantContext v= iter.next(); for(final Genotype g:v.getGenotypes()) { if(!g.isHet()) continue; List<Interval> list = sample2ctx.get(g.getSampleName()); if(list==null) { list = new ArrayList<>(); sample2ctx.put(g.getSampleName(),list); } list.add(new Interval(v)); } }for(final String sn: sample2ctx.keySet()) { List<Interval> list = sample2ctx.get(sn); if(list.size()<2) continue; for(Interval r:list) { print(r.getContig()); print("\t"); print(r.getStart()-1); print("\t"); print(r.getEnd()); print("\t"); println(sn); }}'  src/test/resources/rotavirus_rf.vcf.gz) src/test/resources/rotavirus_rf.vcf.gz
ADD COMMENTlink written 4 days ago by Pierre Lindenbaum126k

That's quite the one liner! Thanks I'll give it a go. All the best.

ADD REPLYlink written 4 days ago by omid.alavijeh40

Pierre, I think the code you've written here is complicated enough to warrant an explainer. How is your java code picking regions to use with the --targets-file parameter?

ADD REPLYlink written 4 days ago by RamRS25k

the java code generate a bed file. It's used as a file using a proc substitution https://www.tldp.org/LDP/abs/html/process-sub.html

ADD REPLYlink written 4 days ago by Pierre Lindenbaum126k

Right, I'm clear on the process sub part - just wondering how the bed file is created. In other words, can you describe the code in English please?

ADD REPLYlink written 4 days ago by RamRS25k
1
create an associative array [sample,list[interval]]
for each variant
   for each genotype
       if the genotype is het add the interval for this sample

for each sample
   print each interval as bed if the number is > 1
ADD REPLYlink written 4 days ago by Pierre Lindenbaum126k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1261 users visited in the last hour