Question: SNP subsetting using bcftools
0
gravatar for alexcullsci
8 months ago by
alexcullsci10
Canada/Moncton/Université de Moncton
alexcullsci10 wrote:

I am trying to subset SNPs from 32 cultivars. I have the following exclusion criteria:

  1. >20% missing data
  2. SNPs present in less than 50% of genotypes
  3. Each genotype must have at least 50% of included SNPs
  4. >=60% heterozygous calls
  5. Markers missing data across genotypes
  6. Bi-allelic SNPs only
  7. No very rare variants (i.e. MAF<0.05

With the help of the bcftools manual, I've managed to figure out how to subset only biallelic SNPs with a MAF of over 0.05, dealing with criteria 6 and 7.

bcftools view --output-type u --min-alleles 2 --max-alleles 2 \
--types snps --exclude MAF[0]<0.05 --exclude **   --known  variants_raw_sorted.bcf > SNP_filtered_sorted_kkf.bcf

Where the ** is, I am trying to implement the remainder my exclusion criteria, (criteria 1-5), however I am having a great deal of difficulty understanding how to use the expression in bcftools. I am looking to understand how to use the information in the info fields (AN,AC, AF/MAF, NS) to implement my exclusion criteria. Any help is very much appreciated.


Futher information : This is the upstream code that led to the bcf file.

fastp -i ACDC_R1.fastq.gz -I ACDC_R2.fastq.gz \
    -o fastp_ACDC_R1P.fastq.gz fastp_ACDC_R2P.fastq.gz \
    -R "ACDC_fastp_QC" -h "Fastp_ACDC" -j "ACDC_fastp" \
    -e 20 -c -a "auto" -L -r \
    -m --merged_out "ACDC_fastp_Merged" --out1 "ACDC_R1_fastp_Unmerged" --out2 "ACDC_R2_fastp_Unmerged" \
    --unpaired1 "ACDC_R1Pass_R2Fail" --unpaired2 "ACDC_R2Pass_R1Fail"

bowtie2 -p 20 -x cs10 -U ACDC_fastp_Merged.fastq.gz -S ACDC_aligned.sam

samtools view -bS ACDC.sam > ACDC.bam

bcftools mpileup --output-type u -f cs10.fasta -b bam_list.txt | bcftools call -vmO z -o all_raw_condition_1.vcf.gz --threads=20

bcftools view --output-type u --output-file variants_raw.bcf variants_raw.vcf.gz

bcftools sort --max-mem 14000 --output-type u --output-file variants_raw_sorted.bcf
snp bcftools next-gen • 540 views
ADD COMMENTlink modified 8 months ago by Kevin Blighe68k • written 8 months ago by alexcullsci10
1
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe68k
Republic of Ireland
Kevin Blighe68k wrote:

Hey,

It should definitely be possible to find a way to apply these filters - there is always a way. However, it is importat to know which tags are currently in your VCF / BCF. Judging by how you have called your variants, there will be only 1 or 2 tags included by default. You can add more by calling variants like this:

bcftools mpileup \
  --redo-BAQ \
  --min-BQ 30 \
  --per-sample-mF \
  --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR \
  -f Ref_1_20040804.fa ref_1.sorted.bam | \
  bcftools call \
    --multiallelic-caller \
    --variants-only \
    --skip-variants indels \
    -Ov > ref_1_snps.vcf ;

You can add more via a BCFtools plugin, as I show here: A: How to use bcftools to calculate AF INFO field from AC and AN in VCF?

With AF (Allele Frequency) and AN (Allele Number) included, you have immediately added 2 more key components to your 'filtering' repertoire.

---------------

For other filtering, it may involve more complex, indirect ways, like I show here:

--------------

Finally, I presume that you looked through the Expressions section at the end of the bCFtoools manual - https://samtools.github.io/bcftools/bcftools.html ?

Sorry that I cannot help with each individual filter query, but i hope that I've provided enough information such that you can get through them (and I notice that you posted this question 2 days ago, with no answer).

Kevin

ADD COMMENTlink modified 8 months ago • written 8 months ago by Kevin Blighe68k
1

Kevin,

Really appreciate the time you took with your answer and the useful information therein. I was not familiar with the method to get that INFO field full of useful data. I will take a look at your linked answers and see what I can figure out.

Take care, Alex

ADD REPLYlink written 8 months ago by alexcullsci10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1543 users visited in the last hour