How To Count The Noredundant Tag Snps Among The Given Snps List
2
0
Entering edit mode
9.9 years ago
J.F.Jiang ▴ 880

Hi everyone,

I have list of about 600 SNPs, and I want to find how many no-redundant SNPs existed in my list:

the method is based on the LD block, if the SNPs were found in a LD block, then we only sum once for these SNPs

Because I am not good at script, is there any existed one? OR could you tell me how to do it?

Thanks!

snp • 2.0k views
0
Entering edit mode

what format is your file in? perhaps show a few lines of your file (edit the post and paste it).

0
Entering edit mode

only one column with rs# such as: rs5945619 rs4291438 rs2112226 rs2285550 rs760778 rs5965655 rs5924895 rs5924915 rs5924952 rs760105 rs2269368 rs1557501 rs4542114

0
Entering edit mode
9.9 years ago
J.F.Jiang ▴ 880

And I think I get it, though it is a little complicated.

First, I use the 1000G genotypes as the source file and plink software to tag my snp-list, and obtain a tag-list file

Then use the script to remove those in one LD block.

That's the way I use

0
Entering edit mode
9.9 years ago

If you have a single column of duplicated names then first sort then use the uniq command like so:

$cat test 1 2 1 1 2 3$ sort test | uniq
1
2
3

0
Entering edit mode

Sorry, I am afraid this is not the answer. The answer you gave is to use the linux sort command to remove the redundant lines, while my question is to remove those redundant SNPs which are located in the same LD block

1
Entering edit mode

well - but that is the file you claimed to have, perhaps you should edit that and specify the exact information that you have available and the way you want to group it. I think you can still solve the problem with sort/uniq only that you would probably add the LD block information into a column.

0
Entering edit mode

Maybe off topic, but there is no need to use uniq

\$ sort -u test


http://unixhelp.ed.ac.uk/CGI/man-cgi?sort