Question: Merge two SNP vcf files
1
gravatar for fufuyou
20 months ago by
fufuyou110
United States
fufuyou110 wrote:

File1

#CHROM  POS     ID      REF_Zv  ALT_lm                             
chr1A   219620  .       T       A
chr1A   219648  .       A       G
chr1A   219867  .       A       G

file2

#CHROM  POS     ID      REF_Zv  ALT_RV                             
chr1A   219457  .       C       T
chr1A   219670  .       A       G
chr1A   219867  .       A       C

File3

#CHROM  POS     ID      REF_Zv  ALT_lm ALT_RV                            
chr1A   219620  .       T       A    NA
chr1A   219648  .       A       G    NA
chr1A   219867  .       A       G    C
chr1A   219457  .       C       NA   T
chr1A   219670  .       A       NA   C

My command is

awk 'FNR==NR{a[$1,$2];next} {if(a[$1,$2]==""){a[$1,$2]=0};print $1,$2,$3,$4,$5, a[$4,$5]} ' file1 file2 > file3

However, I can not get the file3 which I want. Could you help me improve the command? Thanks, Fuyou

awk snp vcf • 611 views
ADD COMMENTlink modified 20 months ago by zx87549.2k • written 20 months ago by fufuyou110
1

Did you try VCF tools ? They have merge option. http://vcftools.sourceforge.net/perl_module.html

ADD REPLYlink written 20 months ago by Inquisitive8995160

But my data has no SNP format. So vcf-merge does not work.

ADD REPLYlink written 20 months ago by fufuyou110

@Kevin has examples of the right tool to do this with in: Merging vcf files (intersection and union)

ADD REPLYlink written 20 months ago by genomax84k

My SNP vcf files do not have other columns. such as "GT". Thanks, Fuyou

ADD REPLYlink written 20 months ago by fufuyou110
1

Then those are not vcf files and you make things harder by not using standardised file formats.

ADD REPLYlink written 20 months ago by WouterDeCoster43k
4
gravatar for zx8754
20 months ago by
zx87549.2k
London
zx87549.2k wrote:

Using R merge:

# example files
file1 <- read.table(text = "#CHROM  POS     ID      REF_Zv  ALT_lm                             
chr1A   219620  .       T       A
chr1A   219648  .       A       G
chr1A   219867  .       A       G", header = TRUE, stringsAsFactors = FALSE,
                    comment.char = "")
file2 <- read.table(text = "#CHROM  POS     ID      REF_Zv  ALT_RV                             
chr1A   219457  .       C       T
chr1A   219670  .       A       G
chr1A   219867  .       A       C", header = TRUE, stringsAsFactors = FALSE,
                        comment.char = "")

merge(file1, file2, by.x = c("X.CHROM", "POS", "ID", "REF_Zv"), all = TRUE)
#   X.CHROM    POS ID REF_Zv ALT_lm ALT_RV
# 1   chr1A 219457  .      C   <NA>      T
# 2   chr1A 219620  .      T      A   <NA>
# 3   chr1A 219648  .      A      G   <NA>
# 4   chr1A 219670  .      A   <NA>      G
# 5   chr1A 219867  .      A      G      C
ADD COMMENTlink modified 20 months ago • written 20 months ago by zx87549.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1885 users visited in the last hour