Question: merging two different datasets of different samples and variants
0
gravatar for vinayreddynannuru
5 weeks ago by
vinayreddynannuru20 wrote:

Hello All,

I am vinay kumar reddy nannuru, i have a vcf file of 92 samples with 14000 variants in each and i want to merge with another publicly available dataset consists of 30,000 samples with 950,000 variants. How can i merge them by having all samples in same output file with similar variant positions in all samples of output. Could someone please give me a clear explanation. And second question how can i select some subset of samples from the second dataset of 30000 samples. Thank for your time.

Vinay Kumar Reddy Nannuru

snp • 110 views
ADD COMMENTlink written 5 weeks ago by vinayreddynannuru20
1

What have you tried?

ADD REPLYlink written 5 weeks ago by Ram17k

I havenot tried anything, i am new to do this. so i joined the group, i read many questions regarding this. What i understood it can be done by using plink but it is not so clear for me to do. Thanks for your reply.

ADD REPLYlink written 5 weeks ago by vinayreddynannuru20

search this site for 'bcftool merge' or/and 'gatk combinevariants'

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum111k

thank you very much and i am working on it.

ADD REPLYlink written 4 weeks ago by vinayreddynannuru20

If you wish to work with ROD (Reference-Ordered Data) files such as VCF or BED, You should check out the following tools:

  • bcftools
  • vcftools
  • bedtools
  • bedops
  • GATK (sub-tools such as CombineVariants, FilterVariants, etc)
  • samtools (as needed)

One or more of the above will have utilities to do exactly what you want, although you might have to break down your task into smaller steps. Most of the tools above also support piping, so you can chain these multiple steps together to form a reusable pipeline.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Ram17k

thank you very much and i am working on it, i chose to use vcftools and when i used a command it shows folowing;

vcf-isec -f -n ../vcffiles/gbs.africe.impute ../../ZeaGBSv27_publicSamples_imputedV5_AGPv4-161010.vcf 

Could not parse: [../vcffiles/gbs.africe.impute]
 at /usr/local/bin/vcf-isec line 21
    main::error('Could not parse: [../vcffiles/gbs.africe.impute]\x{a}') called at /usr/local/bin/vcf-isec line 71
    main::parse_params() called at /usr/local/bin/vcf-isec line 11
ADD REPLYlink modified 4 weeks ago by Ram17k • written 4 weeks ago by vinayreddynannuru20

-n needs an integer argument - I don't think you're using the command right.

ADD REPLYlink written 4 weeks ago by Ram17k

hello mr ram, yes n is two files, i have added the parameter, and my output file only contains only my project samples with similar positions of both files. But what i want is output includes all the samples from my project data and public data with similar positions. How can i do this. thanks, vinay.

ADD REPLYlink written 4 weeks ago by vinayreddynannuru20

n is two files

-n needs an integer argument. 2 is an integer argument. The names of two files are not an integer argument.

ADD REPLYlink written 4 weeks ago by Ram17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 573 users visited in the last hour