Question: A problem with read ped file in plink?
0
gravatar for mary
8 days ago by
mary190
Bologna university
mary190 wrote:

Hi all

I have two txt file (file1: genotype.txt and file2: 6first coloum of ped), I used them to make ped with paste file1.txt file2.txt which use it for make a ped. the problem is that when I run plink with :

Options in effect:

--ped scottsheep.ped
--map scottisheep.map
--noweb

it give me this error:

52857 (of 52857) markers to be included from [ scottisheep.map ]

ERROR: 
A problem with line 1 in [ scottsheep.ped ]
Expecting 6 + 2 * 52857 = 105720 columns, but found 52864

when I chek the ped file I found that I have successfully added all of the required information columns and just need to split all of my SNP columns which are currently in the following format ("AA") into two separate columns per SNP ("A" "A"). I search it and I know it could be solve in R, But I am new in use R. dose any command for txt file in bash which can split a colom to two coloum .

I am tired to search for this and not found any slouction.

dose any one has any suggestion for me?

plink • 132 views
ADD COMMENTlink modified 1 day ago by zx87544.7k • written 8 days ago by mary190

Can you confirm exactly what you want to get?

Your data is currently in this format:

AA GG TG CC ...

You need to get it to this format for input to PLINK:

A A G G T G C C

From where did you obtain your data?

Grazie mille.

ADD REPLYlink written 7 days ago by Kevin Blighe24k

Hi Kevin I downloaded the data from https://datadryad.org, and yes I need to do what you wrote. I can do it on R but I want to know can I do it on bash

ADD REPLYlink written 6 days ago by mary190

Are you using Mac or linux?

This works on linux:

cat test.txt
AA GG TG CC
AA GG TG CC

sed 's/ \+//g' test.txt | awk '{for (i=1; i<=NF; i+=1) {printf$(i)" "; if (i==NF) printf "\n"}}' FS=''
A A G G T G C C 
A A G G T G C C

I cannot see your exact input, though.

ADD REPLYlink written 6 days ago by Kevin Blighe24k

Hi , I use above command but I get this error

awk: program limit exceeded: maximum number of fields size=32767 FILENAME="-" FNR=1 NR=1

I try to install and use gawk , but I use Ubuntu 12.04 I think the package I am looking for doesn't existand. so I am looking Python script to do that dose any body have sloution?

ADD REPLYlink modified 1 day ago • written 1 day ago by mary190

thanks a lot every body, all of them worked

ADD REPLYlink modified 1 day ago • written 1 day ago by mary190

Which solution worked?

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

I have moved comments which provide a (potential) solution to your issue so you can mark them as accepted if they solve your issue.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink modified 1 day ago • written 1 day ago by WouterDeCoster30k
2
gravatar for WouterDeCoster
1 day ago by
Belgium
WouterDeCoster30k wrote:

A python solution:

python -c "for line in open('test.txt'): print(' '.join(list(line.strip().replace(' ', ''))))"

Adapt test.txt to suit your input file.

This all requires that there are only normal SNPs in there, no funky alleles etc.

ADD COMMENTlink written 1 day ago by WouterDeCoster30k
1
gravatar for Kevin Blighe
1 day ago by
Kevin Blighe24k
Republic of Ireland
Kevin Blighe24k wrote:

I think that Python may have the same issue - not sure.

Could you take a look here to see about installing gawk on Ubuntu 12.04? - https://askubuntu.com/questions/244268/installing-gawk-4-0-on-ubuntu-12-04

Edit: Wouter has helpfully added a Python solution for you to test. Here is a sed only solution, too:

sed 's/ \+//g' test.txt | sed 's/\(.\{1\}\)/\1 /g'
A A G G T G C C 
A A G G T G C C
ADD COMMENTlink modified 1 day ago • written 1 day ago by Kevin Blighe24k

thanks alot, it work but I have plate number at the first of each raw and I wont its seperate. I means I have

R921B02 GG TT AA GG ...

R921E06 TT AA GG CC...

I want

R921B02 G G T T A A G G ...

R921E06 T T A A G G C C...

ADD REPLYlink modified 1 day ago • written 1 day ago by mary190
1

You would have saved us (and you) some time if you provided a small example of your data from the start. Try this:

perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' test.txt > out.txt
ADD REPLYlink written 1 day ago by h.mon16k
1
gravatar for h.mon
1 day ago by
h.mon16k
Brazil
h.mon16k wrote:

A Perl solution:

perl -ne 's/ //g; print join(" ", split( // ) );' test.txt > out.txt
ADD COMMENTlink written 1 day ago by h.mon16k

Good work!

ADD REPLYlink written 1 day ago by Kevin Blighe24k
1
gravatar for Pierre Lindenbaum
1 day ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum110k wrote:
$ echo "AA GG TG CC" | sed 's/\([^ ]\)\([^ ]\)/\1 \2/g'
A A G G T G C C

?

ADD COMMENTlink written 1 day ago by Pierre Lindenbaum110k
1
gravatar for cpad0112
1 day ago by
cpad01127.7k
India
cpad01127.7k wrote:
echo "AA GG TG CC" | sed 's/\s//g;s/./& /g'

or

 echo "AA GG TG CC" | sed 's/./& /g' | tr -s " "

A A G G T G C C
ADD COMMENTlink modified 1 day ago • written 1 day ago by cpad01127.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1246 users visited in the last hour