How to proceed with downstream analysis after tumor-only snp calling using Varscan2?
0
0
Entering edit mode
5 weeks ago
kabir.deb ▴ 80

Hi, I'm using the following command to use varscan2 for the first time for SNP calling in tumor-only mode. However, I'm getting errors during the compression of VCF files by bcftools. Please correct me where I am going wrong.

In addition, since I am a novice in this field, I humbly request that you suggest to me any downstream analysis that may be required after SNP calling is completed. I'm interested in comparing mutation variation and pattern among multiple samples.

DATADIR="/path/latest_BAM"
REF="/path/GRCh38_chr.fa"
VARSCAN="path/VarScan2_env/bin/varscan"
VarScan2VCF="path/VarScan2VCF.py"

fids="DNA1 DNA2
         DNA3 DNA4
         DNA5 DNA6 
         DNA7 DNA8"


for fid in $fids
    do
    time samtools mpileup -f $REF {DATADIR}/${fid}_aln.sort.rmdup2.bam | $VARSCAN pileup2snp --p-value 0.05 --output-vcf 1 >  {DATADIR}/${fid}_aln.sort.rmdup2.VrScnP.vcf
    done;

We got this kind of vcf file,

head DNA7_aln.sort.rmdup2.VrScnP.vcf
Chrom   Position    Ref Cons    Reads1  Reads2  VarFreq Strands1    Strands2    Qual1   Qual2   Pvalue  MapQual1    MapQual2    Reads1Plus  Reads1Minus Reads2Plus  Reads2Minus VarAllele
chr1    261322  T   K   4   4   50% 1   2   74  55  0.03846153846153868 1   1   4   0   3   1   G
chr1    349274  T   Y   4   5   55.56%  1   1   64  42  0.01470588235294129 1   1   4   0   5   0   C
chr1    629274  G   A   0   8   100%    0   1   0   35  7.770007770007759E-5    0   1   0   0   8   0   A
chr1    629626  T   C   0   20  100%    0   1   0   37  7.254444551924877E-12   0   1   0   0   20  0   C
chr1    629906  C   T   0   49  100%    0   2   0   60  3.925014596481366E-29   0   1   0   0   38  11  T
chr1    630750  C   T   0   17  100%    0   1   0   37  4.2852131239177147E-10  0   1   0   0   17  0   T
chr1    631010  T   C   0   9   100%    0   1   0   37  2.056766762649123E-5    0   1   0   0   9   0   C
chr1    631193  A   G   0   16  100%    0   1   0   36  1.6636709775210046E-9   0   1   0   0   16  0   G
chr1    631196  G   A   0   16  100%    0   1   0   38  1.6636709775210046E-9   0   1   0   0   16  0   A

Then trying to compress VCF files using bcftools

for fid in $fids
do
time bcftools view ${DATADIR}/${fid}_aln.sort.rmdup2.VrScnP.vcf  -Oz -o ${DATADIR}/${fid}_aln.sort.rmdup2.VrScnP.vcf.gz
done;

It shows unknown file type for all VCF files, including DNA7,

DNA7_aln.sort.rmdup2.VrScnP.vcf : unknown file type
snv compressing varscan bcftools • 344 views
ADD COMMENT
0
Entering edit mode

what is the output of the following commands:

head DNA7_aln.sort.rmdup2.VrScnP.vcf 

and

file DNA7_aln.sort.rmdup2.VrScnP.vcf 
ADD REPLY
0
Entering edit mode
head DNA7_aln.sort.rmdup2.VrScnP.vcf 

is already shown in the query, and

file  DNA7_aln.sort.rmdup2.VrScnP.vcf
DNA7_aln.sort.rmdup2.VrScnP.vcf: ASCII text
ADD REPLY
0
Entering edit mode
for fid in $fids
do
time blablablabla
done

you should learn to use things like make or a workflow manager like snakemake or nextflow. Furthermore, you should secure your bash scripts by sometimes using set -o pipefail

ADD REPLY

Login before adding your answer.

Traffic: 1481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6