Question

question about isoform switching searching tools

0

Entering edit mode

5.5 years ago

tujuchuanli ▴ 100

Hi I want to search isoform switch in TCGA datasets and I use iso-ktsp to search it ( https://bitbucket.org/regulatorygenomicsupf/iso-ktsp ). This tool was published on NAR ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330360/ ). I follow its manual to prepare input data and running command (java -jar iso-kTSP_v1.0.3.jar input_file). However I meet a mistake. I run the command under win10 64 bit system. Below is the mistake.

Exception in thread "main" java.util.NoSuchElementException

at java.util.TreeMap.key(Unknown Source)

at java.util.TreeMap.firstKey(Unknown Source)

at java.util.TreeSet.first(Unknown Source)

at ktsp.Ktsp.fixedKSelection(Ktsp.java:404)

at ktsp.Ktsp.runKtspWithCrossvvalidation(Ktsp.java:98)

at ktsp.Main.main(Main.java:395)

To be honest, I know nothing about Java. Could you please tell me how to figure out this mistake? Thanks

isoform switching iso-ktsp • 1.7k views

ADD COMMENT • link updated 4.3 years ago by mihizawi ▴ 20 • written 5.5 years ago by tujuchuanli ▴ 100

0

Entering edit mode

Hi , Can you give us your command line ? By the way the documentation don't gives java version and dependencies from links you gives.

ADD REPLY • link 5.5 years ago by Titus ▴ 910

0

Entering edit mode

yes, that is the problem, The manual is too simple! My command line is "java -jar iso-kTSP_v1.0.3.jar input_file" below the part of my input file, you may check and test by using it.

1-T 2-T 3-T 1-N 2-N 3-N 4-N 5-N 6-N 7-N 8-N

ATP9B|374868,uc002lmy.1 0   0   0   0   0.9804  0   0   0   1.6477  2.5449  0

PLDN|26258,uc001zvs.2   31.6855 36.0349 25.7492 0   12.6907 50.0566 46.777  8.9344  0   38.5281 58.1247

RAB28|9364,uc003gmv.2   192.4145    30.552  55.7152 72.6512 84.3184 45.3566 41.6931 54.2987 0   30.6236 36.2905

MRPS18A|55168,uc010jyw.2    4.3133  4.5884  0.523   1.7197  3.035   4.291   2.6091  6.607   5.0741  0   1.4689

NA-4430,uc011kse.1  0.0526  0.7184  0   0   0   3.8588  0   4.0696  0   0   1.3293

PAX5|5079,uc011lqb.1    0   0   0   0   0   0   0   0   0   0   0

TESC|54997,uc001twh.2   782.2099    29.1987 66.8159 363.5179    1922.4236   30.0988 71.2104 516.0637    203.608 25.618  1322.8494

MECOM|2122,uc003ffo.1   2.6301  0.9269  4.7116  1.6378  1.6524  0.7718  1.7511  7.1475  3.0799  0   2.6586

MTO1|25821,uc010kav.2   0   6.966   4.6975  0   4.3018  0   10.057  4.6191  2.8605  6.573   5.7094

ZNF331|55422,uc002qbx.1 421.1697    282.1193    174.4016    0   0   0   88.6044 4.9094  0   0   0

MYBPH|4608,uc001gzh.1   0   1.3904  0   2.7296  0   0.7718  0.5837  0   0   0   0

NBPF14|25832,uc010pae.1 0   11.2206 0   0   0.5178  0   0   0   0   0   0

LYVE1|10894,uc001miv.2  11.0199 44.0297 73.5017 147.7044    56.1829 49.7325 202.039 99.2745 95.8694 1737.3287   81.4078

FLJ22536|401237,uc003ndl.2  0   2.3776  5.9225  0   2.2032  1.1808  7.3428  0   1.7016  1.5084  1.8212

FBLN1|2192,uc003bgj.1   8156.9768   861.1704    16424.2933  9907.8558   6201.4321   1750.6843   23047.8763  19330.0266  12149.3094  13543.2303  30784.9388

ADD REPLY • link updated 5.5 years ago by GenoMax 141k • written 5.5 years ago by tujuchuanli ▴ 100

0

Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep the threads logically organized.

ADD REPLY • link 5.5 years ago by GenoMax 141k

0

Entering edit mode

Is there any example of input data in the documentation ? you could make some first test to check if the software run in a good way. I don't see header in you input example and there is space lines .

ADD REPLY • link 5.5 years ago by Titus ▴ 910

0

Entering edit mode

There is no any example file. At least I didn`t find it. There is a paragraph which depict the basic structure of input file. Below is this discription:

Examples of calls: java -jar iso-kTSP gene_seq.txt java -jar iso-kTSP -o out_iso_analysis.txt -i -k 12 iso_data.tab java -jar iso-kTSP -o out_iso_analysis.txt /home/user/iso_data.tab -c tumor normal -i -n 15 -s 40 -k 4

Input format: The expected format for the input dataset is a tab-separated plain text file (with any extension), where the first row contains the sample labels with suffixes to differentiate between samples belonging to different classes, not necessarily paired. Subsequent lines contain the "gene_id", or "gene_id,isoform_id" for isoforms, in the first column followed by the sample data values (in any numerical format that java can parse), in the same order as in the first row.

  The expected format for the model input file (when using the option -m) is a plain text file 
  (with any extension) that should contain in each line a pair of "gene_id", or of "gene_id,isoform_id" 
  for isoforms, separated by a single whitespace. The number of pairs in the file must be odd.

ADD REPLY • link 5.5 years ago by tujuchuanli ▴ 100

score 2 · Answer 1 · 2020-01-08

Hello,

I am the aothor who developed the code of the iso-ktsp software you were trying to use. I developed it when I was still a bachelor's degree student in computer engineering, as the final project of my degree, and once it was done and working, I ended my collaboration with that research group and didn't pay attention to this software anymore. So, I am sorry for taking so long to answer to you, hopefully I can still be helpful.

I've managed to reproduce your error with the test dataset you provided and I found the reason why it gives you this error. Let me explain.

The "normal" mode of the algorithm has two steps. One where it runs over partitions of the samples to find the best k (the optimal number of pairs for the predictive model) and then a final step that runs over all the data to find the final model. It is the first part that is giving you problems. You only have 3 tumor samples, whereas the default number of iterations to find the best k is 10, but you can't partition 3 samples into 10 iterations. The iterations should not be greater than the least represented class of samples. There's the option -n which allows you to define the iterations for the first part of the algorithm, so if you run the program with the option "-n 3" it works with your dataset. Still, 3 tumor samples and 8 normal ones is a small sample size for this algorithm, so don't expect great results,

I did program a check for this error and it was intedned to give a meaningful message instead of the java exceptions, however it seems that the check isn't working. I appologise for the inconvenience.

Also, I should warn that the example of dataset you provided here won't work as intended for the isofrom version of the algorithm (the -i option) because you only have one isoform for each gene. The isoform version only considers pairs formed by two isoforms of the same gene, so if there are no genes with at least 2 isoforms, it won't be able to form any pairs.

In case you want examples of full datasets used in the article we published, you can find them here: https://figshare.com/articles/TCGA_Iso_kTSP_analysis_dataset/1061917 . For gene datasets, look for the files named "_gene_read_paired" and for isofrom datasets check the ones named "_iso_read_paired".

Once again, I am sorry for being so late with this reply, any other doubts you have, please do ask.