Question

Transform Genomic Data

0

Entering edit mode

6.5 years ago

admamajdi • 0

Hello,

I have a genomic data file with this format:

1=A, 2=C, 3=G, 0=missing (No "T's")

How should I transform this data into SNP (0 1 2 and 5=missing) data?

Thanks,

Adam

SNP • 1.6k views

ADD COMMENT • link 6.5 years ago by admamajdi • 0

0

Entering edit mode

Hey Adam, you have not provided enough information such that anyone can give a reliable answer.

What type of file is it?; Is it binary or plain text?; In what exact format is it? - you should paste an example of your data.

For direct conversions of plain text characters in bash, you can use the tr command after you've piped from cat, for example: cat MyData | tr [1234] [ATGC] converts 1/2/3/4 to A/T/G/C, respectively.

It looks like you want to convert your data into allelic numerical encoding, but you have not stated this specifically. For example,

major allele | major allele = 0
major allele | minor allele = 1
minor allele | minor allele = 2
Missing = 5

To do this, you need to know the minor allele (or whatever allele in question whose effects you are researching)

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Thanks for your reply. Here is the format of my file; it is a plain text:

11333311113
22331313111
22331013111
12131333131
11331313111
11333311113

Yes, I want to convert the file into allelic numerical encoding.

Adam

ADD REPLY • link 6.5 years ago by admamajdi • 0

0

Entering edit mode

Hey Adam, there is still some doubt about what exactly you wish to do...

What is the exact encoding that you wish to use?

1 = A = 0
2 = C = 1
3 = G = 2
0 = missing = 5

If this is the correct encoding, then use cat MyData | tr [1230] [0125]

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Kevin, sorry my question was not clear enough. Actually, this is my first time see such data.

The coding I have is that (for example first row):

11331133113 = AAGGGGAAGGAAGG. In other words, 11 = AA; 33 = GG and so on ...

So, I want to know how I should transform this data into the allelic numerical encoding (0 1 2 and 5 for missing)

ADD REPLY • link 6.5 years ago by admamajdi • 0

1

Entering edit mode

Hi!

It would still help to know the following:

what is the source of the data (from where did you obtain it)?

Your indication is that it's genotyping data (like data obtained from PLINK), where every two bases are paired, but the column numbers are not even and therefore it cannot be genotyping data. Genotyping data would be like this:

A A   T T   G T   C A   T T

A T   T A   G T   C C   A A

If the reference alleles at these positions were A, T, G, C, and A, respectively, then I would encode them as:

0      0     1     1     2
1      1     1     0     0

[counting non-reference bases]

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Hello,

The columns are even in the data file. What I had included here is just a sample (as an example). Yes, the data is genotype data for dairy cattle. I think it is a PLINK format as you mentioned.

Adam

ADD REPLY • link 6.5 years ago by admamajdi • 0

2

Entering edit mode

Hey again Adam,

Thanks for providing the information - I think that we're getting somewhere.

So, it looks like your data was produced from PLINK using the --allele1234 parameter, which encodes as A=1, C=2, G=3, T=4, as you've mentioned.

If you want to convert this to the 012 format where the numbers relate to the number of minor alleles, then you just need to use the --recodeA parameter. See the original PLINK documentation hosted on Brigham & Women's web-domain, here: http://zzz.bwh.harvard.edu/plink/dataman.shtml (search for '--recodeA'). Also take a look at --recodeAD

I'm going to assume that you're going to come back to say that you don't have access to PLINK or the original PLINK files, in which case you will have to calculate the minor allele manually for each SNP, and then convert it to 012 manually, too. I could do this for you quite easily, but I would need access to all of your data.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

admamajdi : Please don't delete threads once they have received a comment or answer. If a particular comment has helped address your question point it out so we can promote it to an answer so the thread can receive closure.

ADD REPLY • link 6.0 years ago by GenoMax 141k

0

Entering edit mode

Hello admamajdi,

Did you delete this post? If you did, could you please give us a reason why you chose to delete it?

Thank you!

ADD REPLY • link 6.0 years ago by Ram 43k