Question: Number of snps are very low
0
gravatar for waqasnayab
2.9 years ago by
waqasnayab150
Pakistan
waqasnayab150 wrote:

Hi,

Dear All,

I have WGS data of one healthy individual. I want to make a VCF file. I am running the following pipeline but the number of SNPs are low, only in thousands. I have a big bam file (256GB) with depth of 38X.

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=16000 REMOVE_DUPLICATES=true INPUT=gcxall_sort_mapped.bam OUTPUT=gcxall_sort_mapped_dedup.bam METRICS_FILE=gcx.mat ASSUME_SORTED=true

nohup java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar AddOrReplaceReadGroups INPUT=gcxall_sort_mapped_dedup.bam OUTPUT=gcxall_sort_mapped_dedup_crctd.bam SORT_ORDER=coordinate RGLB=MP RGPL=SOLID RGPU=NOBARCODE RGSM=GCXI

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar CreateSequenceDictionary REFERENCE=new_newhg38.fa OUTPUT=new_newhg38.dict

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar ReorderSam I=gcxall_sort_mapped_dedup_crctd.bam O=gcxall_sort_mapped_dedup_crctd_rordr.bam REFERENCE=new_newhg38.fa

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr.bam --known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf --known 1000G_phase1.indels.hg19.sites.vcf -o gcxall_sort_mapped_dedup_crctd_rordr.intervals --defaultBaseQualities 35

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T IndelRealigner -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr.bam -targetIntervals gcxall_sort_mapped_dedup_crctd_rordr.intervals -o gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known 1000G_phase1.indels.hg19.sites.vcf -l INFO --defaultBaseQualities 35

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T BaseRecalibrator -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -knownSites 1000G_phase1.indels.hg19.sites.vcf -knownSites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -knownSites dbsnp_138.hg19_out.vcf -o gcxi_recal_data.table --covariate QualityScoreCovariate --covariate ReadGroupCovariate --covariate ContextCovariate --covariate CycleCovariate --solid_nocall_strategy PURGE_READ --solid_recal_mode SET_Q_ZERO_BASE_N

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T PrintReads -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -BQSR gcxi_recal_data.table -o gcxall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -allowPotentiallyMisencodedQuals

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -o raw_variants.vcf -D dbsnp_138.hg19_out.vcf -allowPotentiallyMisencodedQuals

Kindly help me in this regard, what I am missing?

Thanks.

snp vcf • 891 views
ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 2.9 years ago by waqasnayab150
2
Looks like you are mapping to hg38 and recalibrating using hg19 snps
-R /scratch/gcxi_all_bam/new_newhg38.fa 
--known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by rbagnall1.2k

Are you talking about novel SNPs or all SNPs detected (known + unknown)? The only thing I can suggest is trying a different tool for SNP detection such as FreeBayes

ADD REPLYlink written 2.7 years ago by Adrian Pelin2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour