Question: multisample BED files to PED conversion
0
gravatar for nilakshafreezon
5.2 years ago by
Sri Lanka
nilakshafreezon110 wrote:

Hi all,

I have a set of HLA typed casr/control variant files annotated and with a single tab delimited file for each individual.


#Chromo    Position    Reference    Change   Change_type   
5    32548555    C    A    SNP    Hom     
4    32548561    C    G    SNP    Hom    

I wonder whether I can create a PED file with these files. I thought of creating a BED file and then trying to convert it to PED. But since not all the variants are present in each of the individuals, I'm not sure of a way to automate the creation of PED since the variant genotypes should be in exactly same order for all the individuals. One approach would be to take all the annotated tab files and create a single VCF with sample names and convert it to a PED. Yet im not sure how to do that. Even a smallest clues is highly appreciated. Thanks in advance :)

 

bed ped plink gwas vcf • 2.3k views
ADD COMMENTlink modified 5.2 years ago by chrchang5235.5k • written 5.2 years ago by nilakshafreezon110
2
gravatar for chrchang523
5.2 years ago by
chrchang5235.5k
United States
chrchang5235.5k wrote:

One way you could do this:

1. Write a short script to convert one file at a time to TPED (see http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr ) or VCF format.

2. Convert these files to PLINK binary format one at a time (--make-bed).

3. Merge the binary filesets (--bmerge) and output the final result as PED (--recode).

ADD COMMENTlink written 5.2 years ago by chrchang5235.5k

Thank you very much. I will try that and let you know the progress. 

ADD REPLYlink written 5.2 years ago by nilakshafreezon110

Correction: you probably want to use --merge-list instead of --bmerge for the merge step.

ADD REPLYlink written 5.2 years ago by chrchang5235.5k

Dear chrchang,

Sorry for bothering you again. Do you know what criteria does the' --merge-list ' uses for merging the BED files? Since my individual BED files does not contain the SNPs in same order, (Some SNPs are only present in some per individual BED files, I would want plink to merge the bam files using chromosome and position fields) is it possible? I couldn't find any clue :(

ADD REPLYlink written 5.1 years ago by nilakshafreezon110

PLINK's merge will automatically sort by chromosome and position.  (However, you first need to convert your files to PLINK-readable formats.)

ADD REPLYlink written 5.1 years ago by chrchang5235.5k

I was able to convert the files to TPED using a bash script. Only problem is that, I dont have any values for 2nd and 3rd fields (rs IDs and genetic distances) . Since the merge depends on the chromosome and position, (1st and 4th) as you said, I hope I won't get into any trouble when merging the files :)

ADD REPLYlink written 5.1 years ago by nilakshafreezon110

It's safe to use '0' for all the centimorgan coordinates.

Lack of rsIDs is a bigger problem.  You may want to use PLINK 1.9's --set-missing-var-ids flag to address this.

ADD REPLYlink written 5.1 years ago by chrchang5235.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2774 users visited in the last hour