Help for Pooled Exome Sequencing SNP Analysis
1
0
Entering edit mode
11 weeks ago

Hi Guys, I´m looking for your expertise on SNP identification of pooled samples.

More details: I want to Identify SNPs relevant on insecticide-resistant A. aegypti mosquitoes using pooled exome sequencing of treated vs. untreated samples.

My experimental design: Treatment group (insecticide-resistant): Replicates: HLR1, HLR2 (each = 25-pooled mosquitoes)

Control group (untreated): Replicates: SP1, SP2 (each = 25-pooled mosquitoes)

Sequencing: Exome capture (custom probes), Illumina PE, 40× depth

Bioinformatic pipeline used until now:

  1. Quality control: FastQC
  2. Alignment: BWA
  3. BAM processing: samtools view/sort/index + Picard AddOrReplaceReadGroups/MarkDuplicates
  4. Variant calling/filtering: GATK (VCF files generated)

My specific quesiton: How can I compare pooled replicates (HLR1/HLR2 vs. SP1/SP2) to identify SNPs relevant in treated samples? Is it possible to perform the pairwise comparison usng the VCF I just obtained?How can I prioritize relevant SNPs?

This is the code I´ve used:

#Indexing using BWA
bwa index VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta

#Mapping with BWA
bwa mem -t 30 -M /mnt/disc2/grupobcei/ewas/index/VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta _R1_subsample.fastq _R2_subsample.fastq > _subsample_BWA.sam

#Samtools SAM to BAM requirements 
samtools view -S -b HLR1_subsample_BWA.sam > HLR1_subsample_BWA.bam
samtools sort -o HLR1_sbs_BWA_sortd.bam HLR1_subsample_BWA.bam
samtools index HLR1_sortd_index.bam

#Picards AddOrReplaceReadGroups
/mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java -jar /mnt/disc2/grupobcei/picard/picard.jar AddOrReplaceReadGroups I=HLR1_sbs_BWA_sortd.bam O=HLR1_BWA_rg.bam SO=coordinate CREATE_INDEX=true RGID=HLR1 RGLB=lib1 RGPL=illumina RGPU=HLR1 RGSM=sample1

#Picards MarkDuplicates
/mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java -jar /mnt/disc2/grupobcei/picard/picard.jar MarkDuplicates I=HLR1_BWA_rg.bam O=HLR1_BWA_-nodups.bam M=HLR1_BWA.metrics REMOVE_DUPLICATES=TRUE

#Variant Callint Using GATK
/mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java "-Xmx30G" -jar /home/administrador/programas/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar HaplotypeCaller -I HLR1_BWA_-nodups.bam -R /mnt/disc2/grupobcei/ewas/index/VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta -O HLR1.variants.vcf --native-pair-hmm-threads 30

#Variant Filtering Using GATK
nohup /mnt/disc2/grupobcei/java/jdk-17.0.12/bin/java -jar /mnt/disc2/grupobcei/gatk/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar VariantFiltration -R /mnt/disc2/grupobcei/ewas/index/VectorBase-68_AaegyptiLVP_AGWG_Genome.fasta -V HLR1.variants.vcf --filter-name FAIL --filter-expression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || DP < 10 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" -O HLR1_filt_10x.vcf
GATK SNP Exome polymorphisms • 691 views
ADD COMMENT
1
Entering edit mode

1) using nohup is most always a bad idea . Learn screen/tmux, learn make/nextflow/snakemake

2) AddOrReplaceReadGroups should be useless if you use the option '-R' of bwa

ADD REPLY
0
Entering edit mode

Hi!

  • nohup is for avoiding internet disconnections on the server I work as is my understanding
  • What do you mean by the 2 option?

thanks

ADD REPLY
0
Entering edit mode

nohup is for avoiding internet disconnections on the server I work as is my understanding

Your server should have a program called tmux that should allow you to work in a session that would be unaffected by internet disconnections: https://hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/

What do you mean by the 2 option?

You can add the read groups during the bwa mem alignment step: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups You then don't need AddOrReplaceReadGroups step.

ADD REPLY
0
Entering edit mode

I understand now. Any experience on the pooled exome seq?

ADD REPLY
0
Entering edit mode
11 weeks ago

call all your bams in one HaplotypeCaller call (= "joint calling vcf" )

use a tool to find the variants present in a subset of samples but not in another set: e.g: see filtering multi sample vcf file

ADD COMMENT
0
Entering edit mode

You mean joining and then obtaining the SNPs that are only present in my resistant treatment?

ADD REPLY

Login before adding your answer.

Traffic: 4546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6