I am working on Horn cancer. I have 25 cancer samples and 5 normal samples of bos indicus (Kankrej) cattle breed. I did sequencing using Illumina Miseq and the data analysis using following pathway :
- Mapping by STAR aligner uisng reference genome of bos taurus.
- Got output in .bam file which sorted using SAMtools and also indexed.
- Variant calling done using 3 tools : SAMools, VarScan and FreeBayes.
- Then, I taken SNPs which are present in all 3 tools using vcf-isec.
- So I have total 25 vcf files for horn cancer and 5 vcf files for horn normal containg SNPs.
- I have taken SNPs which are present in 90% samples of horn cancer and SNPs which are present in 80% of samples of horn normal using vcf i-sec.
- To get SNPs specific to horn cancer, I have subtract Horn Normal vcf file from Horn Cancer vcf file using bedtools. Subtraction confirmation was done using CLC-genomic workbench and VCFtools and found almost same SNPs. So It have to give SNPs which are only present in horn cancer conditions.
- To confirm this, I have seen the specific position of SNPs in IGV - Integrative Genomics Viewer.
In IGV, I seen SNP position in Reference genome which I have used, and also in both Cancer and Normal .bam files which I got after mapping. Actually It have to show this SNPs only in cancer samples but not in normal samples but it shown in both cancer as well as normal samples. SNPs found as compared to reference were present in both group files. So used subtraction command to get cancer specific SNPs but final output file showed SNPs that are also present in both conditions. So How I consider them cancer specific SNPs? If anyone has idea how it's happening ? Please guide me in this case.