Question

Help for Pooled Exome Sequencing SNP Analysis

0

Entering edit mode

11 weeks ago

alejandro.mejiam1 ▴ 20

Hi Guys, I´m looking for your expertise on SNP identification of pooled samples.

More details: I want to Identify SNPs relevant on insecticide-resistant A. aegypti mosquitoes using pooled exome sequencing of treated vs. untreated samples.

My experimental design: Treatment group (insecticide-resistant): Replicates: HLR1, HLR2 (each = 25-pooled mosquitoes)

Control group (untreated): Replicates: SP1, SP2 (each = 25-pooled mosquitoes)

Sequencing: Exome capture (custom probes), Illumina PE, 40× depth

Bioinformatic pipeline used until now:

Quality control: FastQC
Alignment: BWA
BAM processing: samtools view/sort/index + Picard AddOrReplaceReadGroups/MarkDuplicates
Variant calling/filtering: GATK (VCF files generated)

My specific quesiton: How can I compare pooled replicates (HLR1/HLR2 vs. SP1/SP2) to identify SNPs relevant in treated samples? Is it possible to perform the pairwise comparison usng the VCF I just obtained?How can I prioritize relevant SNPs?

This is the code I´ve used:

#Indexing using BWA
bwa index VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta

#Mapping with BWA
bwa mem -t 30 -M /mnt/disc2/grupobcei/ewas/index/VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta _R1_subsample.fastq _R2_subsample.fastq > _subsample_BWA.sam

#Samtools SAM to BAM requirements 
samtools view -S -b HLR1_subsample_BWA.sam > HLR1_subsample_BWA.bam
samtools sort -o HLR1_sbs_BWA_sortd.bam HLR1_subsample_BWA.bam
samtools index HLR1_sortd_index.bam

#Picards AddOrReplaceReadGroups
/mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java -jar /mnt/disc2/grupobcei/picard/picard.jar AddOrReplaceReadGroups I=HLR1_sbs_BWA_sortd.bam O=HLR1_BWA_rg.bam SO=coordinate CREATE_INDEX=true RGID=HLR1 RGLB=lib1 RGPL=illumina RGPU=HLR1 RGSM=sample1

#Picards MarkDuplicates
/mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java -jar /mnt/disc2/grupobcei/picard/picard.jar MarkDuplicates I=HLR1_BWA_rg.bam O=HLR1_BWA_-nodups.bam M=HLR1_BWA.metrics REMOVE_DUPLICATES=TRUE

#Variant Callint Using GATK
/mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java "-Xmx30G" -jar /home/administrador/programas/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar HaplotypeCaller -I HLR1_BWA_-nodups.bam -R /mnt/disc2/grupobcei/ewas/index/VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta -O HLR1.variants.vcf --native-pair-hmm-threads 30

#Variant Filtering Using GATK
nohup /mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java -jar /mnt/disc2/grupobcei/gatk/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar VariantFiltration -R /mnt/disc2/grupobcei/ewas/index/VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta -V HLR1.variants.vcf --filter-name FAIL --filter-expression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || DP < 10 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" -O HLR1_filt_10x.vcf

GATK SNP Exome polymorphisms • 691 views

ADD COMMENT • link 11 weeks ago by alejandro.mejiam1 ▴ 20

1

Entering edit mode

1) using nohup is most always a bad idea . Learn screen/tmux, learn make/nextflow/snakemake

2) AddOrReplaceReadGroups should be useless if you use the option '-R' of bwa

ADD REPLY • link 11 weeks ago by Pierre Lindenbaum 166k

0

Entering edit mode

Hi!

nohup is for avoiding internet disconnections on the server I work as is my understanding
What do you mean by the 2 option?

thanks

ADD REPLY • link 11 weeks ago by alejandro.mejiam1 ▴ 20

0

Entering edit mode

nohup is for avoiding internet disconnections on the server I work as is my understanding

Your server should have a program called tmux that should allow you to work in a session that would be unaffected by internet disconnections: https://hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/

What do you mean by the 2 option?

You can add the read groups during the bwa mem alignment step: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups You then don't need AddOrReplaceReadGroups step.

ADD REPLY • link 11 weeks ago by GenoMax 153k

0

Entering edit mode

I understand now. Any experience on the pooled exome seq?

ADD REPLY • link 11 weeks ago by alejandro.mejiam1 ▴ 20

score 0 · Answer 1 · 2025-06-26

0

Entering edit mode

11 weeks ago

Pierre Lindenbaum 166k

call all your bams in one HaplotypeCaller call (= "joint calling vcf" )

use a tool to find the variants present in a subset of samples but not in another set: e.g: see filtering multi sample vcf file

ADD COMMENT • link 11 weeks ago by Pierre Lindenbaum 166k

0

Entering edit mode

You mean joining and then obtaining the SNPs that are only present in my resistant treatment?

ADD REPLY • link 11 weeks ago by alejandro.mejiam1 ▴ 20