Question: LINCS L1000 dataset column names
2
gravatar for wir
3.5 years ago by
wir50
wir50 wrote:

I'm working with a LINCS L1000 dataset that gives the GE of a cell line before and after perturbation by a small molecule. I am using Level 4 data. After loading the .gct file into matlab, I get a matrix of 22268-by-40172 as well as a vector of column_ids and a vector of row_ids.

Using the row ids and the gene metadata txt file included in the download, I know that each row represents a gene.

I can't figure out what a column represents. Obviously, each columns is a single experiment but I can't understand what each id means.

For example, here is a column id "LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03".

So far, I know that "LJP001" refers to LINCS Joint Project and "BT20" refers to the specific cell line. Somewhere, it must contain information about the small molecule used as a pertubagen but I don't know how to interpret this. Any help would be greatly appreciated!

l1000 lincs • 2.0k views
ADD COMMENTlink modified 3 months ago by e.mohammadi.as0 • written 3.5 years ago by wir50

How do you get the perturbagen from the perturbagen group?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by godwinwoo0

How to download the data?

ADD REPLYlink written 3 months ago by Shicheng Guo8.1k
4
gravatar for wir
3.5 years ago by
wir50
wir50 wrote:

To answer my own question.

The column ids for Level 3 and Level 4 data is basically the distil_id. The example I posted

LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03

can be broken into

  • the perturbagen group "LJP001"
  • the cell line "BT20"
  • the brew prefix "LJP001_BT20_24H"
  • the plate index "X1_B2_DUO52HI53LO"
  • the well index "A03"
  • the distil_id "LJP001_BT20_24H_X1_B2_DUO52HI53LO_A03" (note the switch from ':' to '_')

It turns out that the distil_id doesn't contain enough information to identify the perturbagen used. To identify this, you need to use the LINCS api. Here is more information about using the LINCS api to query the metadata. I also used this Coursera video as a reference. Note that the example given in the question doesn't work with the API.

ADD COMMENTlink written 3.5 years ago by wir50

How do you get the perturbagen from the perturbagen group?

ADD REPLYlink written 2.4 years ago by godwinwoo0
0
gravatar for e.mohammadi.as
3 months ago by
e.mohammadi.as0 wrote:

I have a relevant question. If you noticed, in the list of gene symbols first the landmark genes are presented. Second are the -666 genes which means the unavailable predicted genes. Third are the predicted genes which are almost 19000 genes(22268 genes=978 landmark gene + 2000 unavailable genes (-666) + 19000 predicted genes). In the list of predicted gene symbols (column 1), several gene symbols are repetitive but with different expression values in the same experiment. How it is possible?

ADD COMMENTlink written 3 months ago by e.mohammadi.as0

Please open a new question and mention this post in it. You're not really adding an answer, so why use the "Submit Answer" option?

ADD REPLYlink written 3 months ago by RamRS26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour