Plink filtering with physical position using a text file
2
1
Entering edit mode
3.5 years ago
Sheila ▴ 340

Hi,

I would like to filter my dataset using a list of chromosome and basepair positions using plink.

I understand you can filter based on a list of SNP NAMES in plink:

 plink --bfile filename --extract snpfilename.txt --make-bed --out new_filename

What would be the way to do this with chromosome and position?

Thanks!

plink • 7.0k views
ADD COMMENT
0
Entering edit mode

You may consider first obtaining your MAP file, which contains all variants in your dataset and their chromosome and position, and then using that information for filtering

ADD REPLY
6
Entering edit mode
3.5 years ago

plink "--extract range [filename]" can do this.

The input file is expected to have chromosome IDs in the first column, [first bp, last bp] (ok for these values to be identical) in the second and third columns, and arbitrary range IDs in the fourth column.

ADD COMMENT
0
Entering edit mode

I knew that my way was overly complex! :)

ADD REPLY
0
Entering edit mode
3.0 years ago
amcheroo • 0

Hi, Im trying to filter SNPs using PLINK with this command:

plink --ped allindividuals.ped --map allindividuals.map --allow-extra-chr --indep-pairwise 50 10 0.1

but at the time i count the SNPs in the file plink.prune.in it has only 15 SNPs and in the file plink.prune.out it remains millions of them. In addition, when i change the parameters, the result is the same, no matter how many times i change it.

I would appreciate any help. Thanks :)

ADD COMMENT
0
Entering edit mode

Post the .log file from your run; otherwise we can't really guess what's going on.

ADD REPLY
0
Entering edit mode

This is final part of the .log file. It is much more bigger than this but all the same. Please, let me know if you need more information. Thanks :)

Pruned 11 variants from chromosome 2979, leaving 0.
Pruned 26 variants from chromosome 2980, leaving 0.
Pruned 200 variants from chromosome 2981, leaving 0.
Pruned 63 variants from chromosome 2982, leaving 0.
Pruned 21 variants from chromosome 2983, leaving 0.
Pruned 65 variants from chromosome 2984, leaving 0.
Pruned 77755 variants from chromosome 2985, leaving 0.
Pruned 31 variants from chromosome 2986, leaving 0.
Pruned 393 variants from chromosome 2987, leaving 0.
Pruned 44895 variants from chromosome 2988, leaving 0.
Pruned 48395 variants from chromosome 2989, leaving 0.
Pruned 608 variants from chromosome 2990, leaving 0.
Pruned 39173 variants from chromosome 2991, leaving 0.
Pruned 738 variants from chromosome 2992, leaving 0.
Pruned 212 variants from chromosome 2993, leaving 0.
Pruned 21976 variants from chromosome 2994, leaving 0.
Pruned 521 variants from chromosome 2995, leaving 0.
Pruned 59 variants from chromosome 2996, leaving 0.
Pruned 1110 variants from chromosome 2997, leaving 0.
Pruned 88 variants from chromosome 2998, leaving 0.
Pruned 2689 variants from chromosome 2999, leaving 0.
Pruned 276 variants from chromosome 3000, leaving 0.
Pruned 51 variants from chromosome 3001, leaving 0.
Pruned 13 variants from chromosome 3002, leaving 0.
Pruned 81 variants from chromosome 3003, leaving 0.
Pruned 68 variants from chromosome 3004, leaving 0.
Pruned 128 variants from chromosome 3005, leaving 0.
Pruned 2166 variants from chromosome 3006, leaving 0.
Pruned 38 variants from chromosome 3007, leaving 0.
Pruned 41 variants from chromosome 3008, leaving 0.
Pruned 356 variants from chromosome 3009, leaving 0.
Pruned 64 variants from chromosome 3010, leaving 0.
Pruned 37 variants from chromosome 3011, leaving 0.
Pruned 217 variants from chromosome 3012, leaving 0.
Pruned 976 variants from chromosome 3013, leaving 0.
Pruned 1947 variants from chromosome 3014, leaving 0.
Pruned 116 variants from chromosome 3015, leaving 0.
Pruned 37 variants from chromosome 3016, leaving 0.
Pruning complete.  5894336 of 5894351 variants removed.
Marker lists written to plink.prune.in and plink.prune.out .

End time: Wed May  2 19:55:51 2018
ADD REPLY
0
Entering edit mode

Can you also post the first ~40 lines of the .log? ("head -n 40 plink.log")

ADD REPLY
0
Entering edit mode

Here you have the first 40 lines. Also, the .map and .ped files are converted to a plink format from a vcf file ("python executable.py --input allindividuals.vcf")

PLINK v1.90b5.3 64-bit (21 Feb 2018)
Options in effect:
  --allow-extra-chr
  --indep-pairwise 1000 100 0.9
  --map allindividuals.map
  --ped allindividuas.ped

Hostname: xxxxxxxxxxxxxxxx.cl
Working directory: /home/achero/
Start time: Wed May  2 19:55:31 2018

Random number seed: 1525301731
257764 MB RAM detected; reserving 128882 MB for main workspace.
Scanning .ped file...
Possibly irregular .ped line.  Restarting scan, assuming multichar alleles.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (5894351 variants, 14 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
5894351 variants loaded from .bim file.
14 people (0 males, 0 females, 14 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 14 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.992011.
5894351 variants and 14 people pass filters and QC.
Note: No phenotypes present.
Pruned 31 variants from chromosome 27, leaving 0.
Pruned 74 variants from chromosome 28, leaving 0.
Pruned 285 variants from chromosome 29, leaving 0.
Pruned 35 variants from chromosome 30, leaving 0.
Pruned 570 variants from chromosome 31, leaving 0.
Pruned 51 variants from chromosome 32, leaving 0.
Pruned 64 variants from chromosome 33, leaving 0.
Pruned 12 variants from chromosome 34, leaving 0.
Pruned 43 variants from chromosome 35, leaving 0.
Pruned 124 variants from chromosome 36, leaving 0.
Pruned 286 variants from chromosome 37, leaving 0.
Pruned 103 variants from chromosome 38, leaving 0.
ADD REPLY
1
Entering edit mode

Ok, the main issue is that 14 samples is too few to infer LD patterns from. It looks like the samples are almost identical; I'm guessing if you run "plink --aec --map allindividuals.map --ped allindividuas.ped --mac 1 --make-bed", which removes all variants with minor allele count 0, barely any variants will remain. Sorry to be the bearer of bad news, but you will need to wait until you have a lot more samples, and/or rethink your analysis plan.

In the meantime, plink can directly import VCF files with "plink --vcf allindividuals.vcf"; there's no point to converting to the obsolete .ped/.map format first.

ADD REPLY
0
Entering edit mode

Despite it seems we have bad news, I really appreciate your help. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6