Filter dosage file by list of SNP IDs
1
0
Entering edit mode
2.7 years ago
James ▴ 10

Hello, does anyone by any chance know of a fast/computationally efficient way to select lines in a .dosage file if the first column's SNP ID is also contained within a .txt document of SNP IDs?

The .dosage file is in the following format:

SNPID Position REF ALT Sample1Dosage Sample2Dosage Sample3Dosage . . .
1:100:A:C A C 0 2 1 . . .
1:101:C:T C T 1 2 1 . . .
. . .

The list of SNP IDs in a .txt document is in the following format:

1:100:A:C
1:101:C:T
1:103:G:A
1:105:C:T

. . .

I have tried using grep -f snp_IDs.txt example.dosage > filtered_example.dosage, but the command is unfortunately too slow for my server to run it without hitting the max wall time

dosage snp genomics • 652 views
ADD COMMENT
1
Entering edit mode
2.7 years ago
James ▴ 10

Found the solution myself, but keeping this question up for others who may run into the same problem. Instead of using:

grep -f snp_IDs.txt example.dosage > filtered_example.dosage

Use:

grep -F -f snp_IDs.txt example.dosage > filtered_example.dosage

This runs extremely fast! (as long as you don't have to filter on any regex expressions)

ADD COMMENT

Login before adding your answer.

Traffic: 2024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6