Question

merging two different datasets of different samples and variants

0

Entering edit mode

5.8 years ago

vinayreddynannuru ▴ 20

Hello All,

I am vinay kumar reddy nannuru, i have a vcf file of 92 samples with 14000 variants in each and i want to merge with another publicly available dataset consists of 30,000 samples with 950,000 variants. How can i merge them by having all samples in same output file with similar variant positions in all samples of output. Could someone please give me a clear explanation. And second question how can i select some subset of samples from the second dataset of 30000 samples. Thank for your time.

Vinay Kumar Reddy Nannuru

SNP • 1.3k views

ADD COMMENT • link 5.8 years ago by vinayreddynannuru ▴ 20

1

Entering edit mode

What have you tried?

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

I havenot tried anything, i am new to do this. so i joined the group, i read many questions regarding this. What i understood it can be done by using plink but it is not so clear for me to do. Thanks for your reply.

ADD REPLY • link 5.8 years ago by vinayreddynannuru ▴ 20

0

Entering edit mode

search this site for 'bcftool merge' or/and 'gatk combinevariants'

ADD REPLY • link 5.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

thank you very much and i am working on it.

ADD REPLY • link 5.8 years ago by vinayreddynannuru ▴ 20

0

Entering edit mode

If you wish to work with ROD (Reference-Ordered Data) files such as VCF or BED, You should check out the following tools:

bcftools
vcftools
bedtools
bedops
GATK (sub-tools such as CombineVariants, FilterVariants, etc)
samtools (as needed)

One or more of the above will have utilities to do exactly what you want, although you might have to break down your task into smaller steps. Most of the tools above also support piping, so you can chain these multiple steps together to form a reusable pipeline.

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

thank you very much and i am working on it, i chose to use vcftools and when i used a command it shows folowing;

vcf-isec -f -n ../vcffiles/gbs.africe.impute ../../ZeaGBSv27_publicSamples_imputedV5_AGPv4-161010.vcf 

Could not parse: [../vcffiles/gbs.africe.impute]
 at /usr/local/bin/vcf-isec line 21
    main::error('Could not parse: [../vcffiles/gbs.africe.impute]\x{a}') called at /usr/local/bin/vcf-isec line 71
    main::parse_params() called at /usr/local/bin/vcf-isec line 11

ADD REPLY • link updated 5.8 years ago by Ram 43k • written 5.8 years ago by vinayreddynannuru ▴ 20

0

Entering edit mode

-n needs an integer argument - I don't think you're using the command right.

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

hello mr ram, yes n is two files, i have added the parameter, and my output file only contains only my project samples with similar positions of both files. But what i want is output includes all the samples from my project data and public data with similar positions. How can i do this. thanks, vinay.

ADD REPLY • link 5.8 years ago by vinayreddynannuru ▴ 20

0

Entering edit mode

n is two files

-n needs an integer argument. 2 is an integer argument. The names of two files are not an integer argument.

ADD REPLY • link 5.8 years ago by Ram 43k