I'm working on a project for determining how this two protozoan organisms have evolved and in which genes they differ the most by looking at the SPNs.
I have done some previous analysis calling for variants by using some programs
# construct the FM-index for the reference genome (in this case the Parasite genome) bwa index ref.fa # Find the SA coordinates of the input reads bwa aln ref.fa short_read.fq > aln_sa.sai # Generate alignments in the SAM format given paired-end reads bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln-pe.sam # Transform .sam files into .bam files samtools view -b -o output.bam input.sam # Sort that bam file samtools sort -@ 15 -o outfile infile # Remove duplicates java -Xmx2g -jar picard.jar MarkDuplicates MAX_FILE_HANDLES=500 REMOVE_DUPLICATES=true I= sorted.bam O=marked_duplicates.bam M=marked_dup_metrics.txt # Create .bai file for each .bam file bamtools index -in file.bam # Run Freebayes (looking for SNPs) freebayes -f refernce.fasta file1.bam file2.bam file3.bam > combined.freebayes.vcf # Compress and index the vcf files bgzip -c combined.freebayes.vcf > combined.freebayes.vcf.gz tabix -p vcf combined.freebayes.vcf.gz
The problem now is that I ouwld like to use this .vcf file for determining the genes where there are more SPNs, the function of these genes and (if it is posible) determine how these two related organisms have evolved.
I know some programs like ANNOVAR http://annovar.openbioinformatics.org/en/latest/ or VEP https://www.ensembl.org/info/docs/tools/vep/index.html can be used in order to determine the effects of the variants (SNPs). Since I'm working on a conda environment, the problems I’m having are:
1) ANNOVAR cannot be installed in conda (or, at least, I haven’t found any command for installing it)
2) VEP can be installed in conda https://anaconda.org/bioconda/ensembl-vep but when installing it it says that the library that you need in order to do the analysis are not included so you need to run another command (vep_install) that it gives me errors each time I try to run it in my environment of conda.
vep_install -a cf -s plasmodium_relictum -y PRELSG -c /PATH/miniconda3/pkgs/ensembl-vep-98.3-pl526hecc5488_0/share/ensembl-vep-98.3-0/vep_install
- getting list of available cache files ERROR: No matching species found at /PATH/miniconda3/envs/salmon/bin/vep_install line 1121, <> line 1.
What I am doing wrong? I just want to have the library in oder to use VEP in a normal way...