Question: merging two different datasets of different samples and variants
0
gravatar for vinayreddynannuru
3 months ago by
vinayreddynannuru20 wrote:

Hello All,

I am vinay kumar reddy nannuru, i have a vcf file of 92 samples with 14000 variants in each and i want to merge with another publicly available dataset consists of 30,000 samples with 950,000 variants. How can i merge them by having all samples in same output file with similar variant positions in all samples of output. Could someone please give me a clear explanation. And second question how can i select some subset of samples from the second dataset of 30000 samples. Thank for your time.

Vinay Kumar Reddy Nannuru

snp • 149 views
ADD COMMENTlink written 3 months ago by vinayreddynannuru20
1

What have you tried?

ADD REPLYlink written 3 months ago by RamRS18k

I havenot tried anything, i am new to do this. so i joined the group, i read many questions regarding this. What i understood it can be done by using plink but it is not so clear for me to do. Thanks for your reply.

ADD REPLYlink written 3 months ago by vinayreddynannuru20

search this site for 'bcftool merge' or/and 'gatk combinevariants'

ADD REPLYlink written 3 months ago by Pierre Lindenbaum113k

thank you very much and i am working on it.

ADD REPLYlink written 3 months ago by vinayreddynannuru20

If you wish to work with ROD (Reference-Ordered Data) files such as VCF or BED, You should check out the following tools:

  • bcftools
  • vcftools
  • bedtools
  • bedops
  • GATK (sub-tools such as CombineVariants, FilterVariants, etc)
  • samtools (as needed)

One or more of the above will have utilities to do exactly what you want, although you might have to break down your task into smaller steps. Most of the tools above also support piping, so you can chain these multiple steps together to form a reusable pipeline.

ADD REPLYlink modified 3 months ago • written 3 months ago by RamRS18k

thank you very much and i am working on it, i chose to use vcftools and when i used a command it shows folowing;

vcf-isec -f -n ../vcffiles/gbs.africe.impute ../../ZeaGBSv27_publicSamples_imputedV5_AGPv4-161010.vcf 

Could not parse: [../vcffiles/gbs.africe.impute]
 at /usr/local/bin/vcf-isec line 21
    main::error('Could not parse: [../vcffiles/gbs.africe.impute]\x{a}') called at /usr/local/bin/vcf-isec line 71
    main::parse_params() called at /usr/local/bin/vcf-isec line 11
ADD REPLYlink modified 3 months ago by RamRS18k • written 3 months ago by vinayreddynannuru20

-n needs an integer argument - I don't think you're using the command right.

ADD REPLYlink written 3 months ago by RamRS18k

hello mr ram, yes n is two files, i have added the parameter, and my output file only contains only my project samples with similar positions of both files. But what i want is output includes all the samples from my project data and public data with similar positions. How can i do this. thanks, vinay.

ADD REPLYlink written 3 months ago by vinayreddynannuru20

n is two files

-n needs an integer argument. 2 is an integer argument. The names of two files are not an integer argument.

ADD REPLYlink written 3 months ago by RamRS18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1139 users visited in the last hour