Question: A problem with read ped file in plink?
0
gravatar for mary
10 weeks ago by
mary190
Bologna university
mary190 wrote:

Hi all

I have two txt file (file1: genotype.txt and file2: 6first coloum of ped), I used them to make ped with paste file1.txt file2.txt which use it for make a ped. the problem is that when I run plink with :

Options in effect:

--ped scottsheep.ped
--map scottisheep.map
--noweb

it give me this error:

52857 (of 52857) markers to be included from [ scottisheep.map ]

ERROR: 
A problem with line 1 in [ scottsheep.ped ]
Expecting 6 + 2 * 52857 = 105720 columns, but found 52864

when I chek the ped file I found that I have successfully added all of the required information columns and just need to split all of my SNP columns which are currently in the following format ("AA") into two separate columns per SNP ("A" "A"). I search it and I know it could be solve in R, But I am new in use R. dose any command for txt file in bash which can split a colom to two coloum .

I am tired to search for this and not found any slouction.

dose any one has any suggestion for me?

plink • 218 views
ADD COMMENTlink modified 9 weeks ago by zx87545.0k • written 10 weeks ago by mary190

Can you confirm exactly what you want to get?

Your data is currently in this format:

AA GG TG CC ...

You need to get it to this format for input to PLINK:

A A G G T G C C

From where did you obtain your data?

Grazie mille.

ADD REPLYlink written 10 weeks ago by Kevin Blighe28k

Hi Kevin I downloaded the data from https://datadryad.org, and yes I need to do what you wrote. I can do it on R but I want to know can I do it on bash

ADD REPLYlink written 10 weeks ago by mary190

Are you using Mac or linux?

This works on linux:

cat test.txt
AA GG TG CC
AA GG TG CC

sed 's/ \+//g' test.txt | awk '{for (i=1; i<=NF; i+=1) {printf$(i)" "; if (i==NF) printf "\n"}}' FS=''
A A G G T G C C 
A A G G T G C C

I cannot see your exact input, though.

ADD REPLYlink written 10 weeks ago by Kevin Blighe28k

Hi , I use above command but I get this error

awk: program limit exceeded: maximum number of fields size=32767 FILENAME="-" FNR=1 NR=1

I try to install and use gawk , but I use Ubuntu 12.04 I think the package I am looking for doesn't existand. so I am looking Python script to do that dose any body have sloution?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by mary190

thanks a lot every body, all of them worked

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by mary190

Which solution worked?

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

I have moved comments which provide a (potential) solution to your issue so you can mark them as accepted if they solve your issue.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by WouterDeCoster32k
2
gravatar for WouterDeCoster
9 weeks ago by
Belgium
WouterDeCoster32k wrote:

A python solution:

python -c "for line in open('test.txt'): print(' '.join(list(line.strip().replace(' ', ''))))"

Adapt test.txt to suit your input file.

This all requires that there are only normal SNPs in there, no funky alleles etc.

ADD COMMENTlink written 9 weeks ago by WouterDeCoster32k
1
gravatar for Kevin Blighe
9 weeks ago by
Kevin Blighe28k
USA / Europe / Brazil
Kevin Blighe28k wrote:

I think that Python may have the same issue - not sure.

Could you take a look here to see about installing gawk on Ubuntu 12.04? - https://askubuntu.com/questions/244268/installing-gawk-4-0-on-ubuntu-12-04

Edit: Wouter has helpfully added a Python solution for you to test. Here is a sed only solution, too:

sed 's/ \+//g' test.txt | sed 's/\(.\{1\}\)/\1 /g'
A A G G T G C C 
A A G G T G C C
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Kevin Blighe28k

thanks alot, it work but I have plate number at the first of each raw and I wont its seperate. I means I have

R921B02 GG TT AA GG ...

R921E06 TT AA GG CC...

I want

R921B02 G G T T A A G G ...

R921E06 T T A A G G C C...

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by mary190
1

You would have saved us (and you) some time if you provided a small example of your data from the start. Try this:

perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' test.txt > out.txt
ADD REPLYlink written 9 weeks ago by h.mon19k
1
gravatar for h.mon
9 weeks ago by
h.mon19k
Brazil
h.mon19k wrote:

A Perl solution:

perl -ne 's/ //g; print join(" ", split( // ) );' test.txt > out.txt
ADD COMMENTlink written 9 weeks ago by h.mon19k

Good work!

ADD REPLYlink written 9 weeks ago by Kevin Blighe28k
1
gravatar for Pierre Lindenbaum
9 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:
$ echo "AA GG TG CC" | sed 's/\([^ ]\)\([^ ]\)/\1 \2/g'
A A G G T G C C

?

ADD COMMENTlink written 9 weeks ago by Pierre Lindenbaum112k
1
gravatar for cpad0112
9 weeks ago by
cpad01128.9k
India
cpad01128.9k wrote:
echo "AA GG TG CC" | sed 's/\s//g;s/./& /g'

or

 echo "AA GG TG CC" | sed 's/./& /g' | tr -s " "

A A G G T G C C
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by cpad01128.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 634 users visited in the last hour