I am working with Complete genomics data from pipeline version 2.5. I need to add 1000 genome data along with my sample and make a multigenome vcf file. Since the 1K genome project data are from 2.0.0 version, I was wondering if this is something I should be concerned about? If there is any batch effect, what would you normally expect in the CG data with 2.0.0 vs 2.5 pipeline version?
Additionally, I would also like to know if mkvcf tool is the right tool to merge multi genome data and make a combined vcf. Is there a proper tool to annotate that vcf ?
Question: Complete Genomics data analysis, pipeline version and batch effect
0
MAPK • 1.7k wrote:
1
Dhana • 80 wrote:
For the annotation part, you can use cgatools join command. Since the data is also from Complete Genomics Inc. it will be easier to use cgatools for most part.
You can use it as;
cgatools join --beta
--input <file1> <file2> \
--match <specifications> \
--overlap <specifications> \
--select <output_fields_required> \
--output-mode <arg> \
--always-dump
these are the minimum specification you have to provide to run the tool.
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 1745 users visited in the last hour