Question: A problem with read ped file in plink?
0
gravatar for mary
4 months ago by
mary190
Bologna university
mary190 wrote:

Hi all

I have two txt file (file1: genotype.txt and file2: 6first coloum of ped), I used them to make ped with paste file1.txt file2.txt which use it for make a ped. the problem is that when I run plink with :

Options in effect:

--ped scottsheep.ped
--map scottisheep.map
--noweb

it give me this error:

52857 (of 52857) markers to be included from [ scottisheep.map ]

ERROR: 
A problem with line 1 in [ scottsheep.ped ]
Expecting 6 + 2 * 52857 = 105720 columns, but found 52864

when I chek the ped file I found that I have successfully added all of the required information columns and just need to split all of my SNP columns which are currently in the following format ("AA") into two separate columns per SNP ("A" "A"). I search it and I know it could be solve in R, But I am new in use R. dose any command for txt file in bash which can split a colom to two coloum .

I am tired to search for this and not found any slouction.

dose any one has any suggestion for me?

plink • 292 views
ADD COMMENTlink modified 4 months ago by zx87545.6k • written 4 months ago by mary190

Can you confirm exactly what you want to get?

Your data is currently in this format:

AA GG TG CC ...

You need to get it to this format for input to PLINK:

A A G G T G C C

From where did you obtain your data?

Grazie mille.

ADD REPLYlink written 4 months ago by Kevin Blighe32k

Hi Kevin I downloaded the data from https://datadryad.org, and yes I need to do what you wrote. I can do it on R but I want to know can I do it on bash

ADD REPLYlink written 4 months ago by mary190

Are you using Mac or linux?

This works on linux:

cat test.txt
AA GG TG CC
AA GG TG CC

sed 's/ \+//g' test.txt | awk '{for (i=1; i<=NF; i+=1) {printf$(i)" "; if (i==NF) printf "\n"}}' FS=''
A A G G T G C C 
A A G G T G C C

I cannot see your exact input, though.

ADD REPLYlink written 4 months ago by Kevin Blighe32k

Hi , I use above command but I get this error

awk: program limit exceeded: maximum number of fields size=32767 FILENAME="-" FNR=1 NR=1

I try to install and use gawk , but I use Ubuntu 12.04 I think the package I am looking for doesn't existand. so I am looking Python script to do that dose any body have sloution?

ADD REPLYlink modified 4 months ago • written 4 months ago by mary190

thanks a lot every body, all of them worked

ADD REPLYlink modified 4 months ago • written 4 months ago by mary190

Which solution worked?

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

I have moved comments which provide a (potential) solution to your issue so you can mark them as accepted if they solve your issue.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink modified 4 months ago • written 4 months ago by WouterDeCoster34k
2
gravatar for WouterDeCoster
4 months ago by
Belgium
WouterDeCoster34k wrote:

A python solution:

python -c "for line in open('test.txt'): print(' '.join(list(line.strip().replace(' ', ''))))"

Adapt test.txt to suit your input file.

This all requires that there are only normal SNPs in there, no funky alleles etc.

ADD COMMENTlink written 4 months ago by WouterDeCoster34k
1
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe32k
Republic of Ireland
Kevin Blighe32k wrote:

I think that Python may have the same issue - not sure.

Could you take a look here to see about installing gawk on Ubuntu 12.04? - https://askubuntu.com/questions/244268/installing-gawk-4-0-on-ubuntu-12-04

Edit: Wouter has helpfully added a Python solution for you to test. Here is a sed only solution, too:

sed 's/ \+//g' test.txt | sed 's/\(.\{1\}\)/\1 /g'
A A G G T G C C 
A A G G T G C C
ADD COMMENTlink modified 4 months ago • written 4 months ago by Kevin Blighe32k

thanks alot, it work but I have plate number at the first of each raw and I wont its seperate. I means I have

R921B02 GG TT AA GG ...

R921E06 TT AA GG CC...

I want

R921B02 G G T T A A G G ...

R921E06 T T A A G G C C...

ADD REPLYlink modified 4 months ago • written 4 months ago by mary190
1

You would have saved us (and you) some time if you provided a small example of your data from the start. Try this:

perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' test.txt > out.txt
ADD REPLYlink written 4 months ago by h.mon21k
1
gravatar for h.mon
4 months ago by
h.mon21k
Brazil
h.mon21k wrote:

A Perl solution:

perl -ne 's/ //g; print join(" ", split( // ) );' test.txt > out.txt
ADD COMMENTlink written 4 months ago by h.mon21k

Good work!

ADD REPLYlink written 4 months ago by Kevin Blighe32k
1
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:
$ echo "AA GG TG CC" | sed 's/\([^ ]\)\([^ ]\)/\1 \2/g'
A A G G T G C C

?

ADD COMMENTlink written 4 months ago by Pierre Lindenbaum114k
1
gravatar for cpad0112
4 months ago by
cpad011210.0k
India
cpad011210.0k wrote:
echo "AA GG TG CC" | sed 's/\s//g;s/./& /g'

or

 echo "AA GG TG CC" | sed 's/./& /g' | tr -s " "

A A G G T G C C
ADD COMMENTlink modified 4 months ago • written 4 months ago by cpad011210.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 612 users visited in the last hour