Question

kissplice on TWAS analysis

0

Entering edit mode

7.1 years ago

wangdp123 ▴ 340

Hi,

I am very happy to try the software on http://kissplice.prabi.fr/TWAS/ that is used to identify the SNPs from de-novo RNA-Seq data.

The issue is that I have a large number of fastq files and I would like to feed them all into the kissplice but I am not sure if it is OK for kissplice to run such a large amount of data at a time.

Actually, I have tested 2 samples including four fastq files (left and right reads) which worked but when I input samples with number of 10 seemed that it needs more memory.

Is there any upper bound for the number of samples for kissplice?
I am confused that the samples have been split into two files (left and right reads) for paired-end data such as cond1_replicat1_R1.fastq and cond1_replicat1_R2.fastq. Will the software treat them as two samples or single sample? If it is the latter, how can the software recognize which two belong to the same sample as the arguments here are always "-r"?

Many thanks,

With best regards,

Tom

RNA-Seq snp next-gen kissplice • 1.9k views

ADD COMMENT • link updated 7.1 years ago by leandro.ishi.lima ▴ 90 • written 7.1 years ago by wangdp123 ▴ 340

0

Entering edit mode

kissplice would be a very logical tag for your post, I have added this. Make sure to use the right tags so the authors of the tool get notified of your post.

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

score 0 · Answer 1 · 2017-03-28

Hello Tom,

Sorry for the delay on answering you.

Is there any upper bound for the number of samples for kissplice?

No, did you have any problem when running it with your large number of samples? Surely, if your dataset is huge, you will probably need more disk, CPU time and RAM. It is also a nice idea running "ulimit -s unlimited" if you know that your dataset is large, since there are some recursive algorithms which may overflow the stack. If you managed to build the graph, and need to re-run KisSplice due to some error, take a look at the -g parameter of KisSplice to avoid re-building it (it can save you time).

I am confused that the samples have been split into two files (left and right reads) for paired-end data such as cond1_replicat1_R1.fastq and cond1_replicat1_R2.fastq. Will the software treat them as two samples or single sample? If it is the latter, how can the software recognize which two belong to the same sample as the arguments here are always "-r"?

They are considered as two different samples by KisSplice. KisSplice's focus is to only find the events and quantify them, so there is no harm on considering it two different samples. In KissDE, where you search for differentially expressed events in two different conditions, you have then to specify the conditions, e.g.:

snp<-kissplice2counts("results_cond1_cond2_k41_coherents_type_0.fa", pairedEnd=TRUE )

The pairedEnd=TRUE specifies that following each read file comes its mate-pair read file. And:

human_conditions<-c("c1","c1","c2", "c2")

Specifies then that the first two pairs of read files stem from a condition 1 and the last two pairs of read files stem from condition 2. Note that these conditions must agree with the read files given to KisSplice (KissDE use the quantification output by KisSplice). Knowing which read file is from which condition is only crucial to KissDE.

I am not sure if I was clear enough, please don't hesitate on asking in case of any doubts!

++