Question: bcftools: DP filter based on chromosome and sex
1
gravatar for bsmith030465
2.5 years ago by
bsmith030465170
United States
bsmith030465170 wrote:

Hi,

I want to filter my bcf file such that:

  1. Autosomal chromosomes are filtered with DP > 20
  2. X chromosome, for females is filtered with DP > 20
  3. X chromosome, for males, with DP > 10 (i.e relax this criteria for males)

My current plan is to split the data into autosomal (and chrY) and X chromosome using something like (any way to specify chr1-chr22?):

bcftools view -i 'DP>20' input.bcf --regions chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrY -Ob -o autosomalY.bcf


## sample ids as they appear in input.bcf file
bcftools query -l input.bcf > allsampleids.txt

I can process the X chromosome using:

## assuming femaleIDs.txt contains female sample ids
bcftools view -i 'DP>20' input.bcf --regions chrX --samples-file femaleIDs.txt -Ob -o Xfemales.bcf
bcftools view -i 'DP>10' input.bcf --regions chrX --samples-file ^femaleIDs.txt -Ob -o Xmales.bcf

My questions:

  1. How should I recombine the outputs, such that I again have one bcf dataset? i.e. how do I combine autosomalY.bcf, Xfemales.bcf and Xmales.bcf? Not sure if this is possible or even a good idea....
  2. Is there a cleaner way to do this?

Thanks!

ADD COMMENTlink modified 2.5 years ago by Pierre Lindenbaum134k • written 2.5 years ago by bsmith030465170
0
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

using vcffilterjdk

with the code:

if(variant.getContig().equals("Y")) {
    return false;
    }
else if(variant.getContig().equals("X")) {
final Set<String> setM = CollectionUtil.makeSet("S1","S2");
final Set<String> setF = CollectionUtil.makeSet("S2","S4","S5");

boolean ok_m = variant.getGenotypes().stream().filter(G->G.hasDP() && setM.contains(G.getSampleName())).mapToInt(G->G.getDP()).sum() > 10;
boolean ok_f = variant.getGenotypes().stream().filter(G->G.hasDP() && setF.contains(G.getSampleName())).mapToInt(G->G.getDP()).sum() > 20;

return ok_m && ok_f;
}
else
{
return variant.getAttributeAsInt("DP",0)>20;
}

exemple:

java -jar dist/vcffilterjdk.jar -f input.code input.vcf

ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum134k

Hi Pierre,

Thanks!!!

It would be extremely helpful if there was some documentation with this (where is it reading the male/female ids?). Also, what if I wanted to to do 'DP>10 & GQ >10', how do I incorporate multiple conditions?

ADD REPLYlink written 2.5 years ago by bsmith030465170

t would be extremely helpful if there was some documentation

my bad ! I forgot to add the link : http://lindenb.github.io/jvarkit/VcfFilterJdk.html

where is it reading the male/female ids

it's hard coded:

final Set<String> setM = CollectionUtil.makeSet("S1","S2");

how do I incorporate multiple conditions?

basically it requires to a knowledge of java and the library for HTS. But there are many examples or links to previous biostar post in the manual : http://lindenb.github.io/jvarkit/VcfFilterJdk.html

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum134k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 955 users visited in the last hour
_