Question: check duplicates in two columns
0
gravatar for marcoabbestia
3.9 years ago by
European Union
marcoabbestia0 wrote:

Hi all!

I have two different files: a .map (from illumina genotyping with bead chip) and a .vcf (from NGS of a Pools of individuals). I'm interested in finding variations that are in both files, so I would have to compare for column 1: #CHROM and 4: POS (for .map) and column 1 #CHROM and 2: POS (for .vcf) to obtain some variations that are in common. I tried using awk but without success. Any suggestions will be very appreciated.

Greetings

Marco

sequencing snp chip-seq • 1.1k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by marcoabbestia0
2

On stackoverflow.com you will find "thousands" of questions related to this issue.

ADD REPLYlink written 3.9 years ago by iraun3.5k
2

On biostars too :-)

check if two columns match in 2 annotation files and print those lines to a new output file

ADD REPLYlink written 3.9 years ago by PoGibas4.8k

Yup, that's true, but not thousands :-P.
 

ADD REPLYlink written 3.9 years ago by iraun3.5k

You win. Technically.

ADD REPLYlink written 3.9 years ago by RamRS21k
1

Can you post your awk command?

ADD REPLYlink written 3.9 years ago by PoGibas4.8k

thank you for the answers, my awk command is:

awk -F'\t' 'NR==FNR{c[$1$2]++;next};c[$1$4] > 0' file.vcf file.map

where $1$2 are #CHROM and POS in the .vcf file and $1$4 are #CHROM and POS for the .map file

ADD REPLYlink modified 3.9 years ago by PoGibas4.8k • written 3.9 years ago by marcoabbestia0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1461 users visited in the last hour