Question: input for rMATS
1
gravatar for bisht20diksha
2.8 years ago by
bisht20diksha20 wrote:

i want to identify differential alternative splicing between two conditions having three replicates each, for that i m using rMATS. I have already generated genome indexes through two pass STAR mapping.

I have three bam files for three control replicates and another three for treated. shall i merge control bam files and also treated bam files and then used the command as:

 python rmats.py --b1 merged.bam --b2 merged.bam  --gtf gtfFile mygtf --bi STARindexFolder index -od outDir result -t paired -readLength ?

or shall I use

python rmats.py --b1 1.bam 2.bam 3.bam --b2 4.bam 5.bam 6.bam --gtf gtfFile mygtf --bi STARindexFolder index -od outDir result -t paired -readLength ?

Where 1.bam, 2.bam and 3.bam are bam files of controll replicates and 4.bam, 5.bam and 6.bam are bam files of treated replicates.

Also I have a confusion what readLength here means? Is length of fastq reads or something else? If former then how to choose the read length when it might be different for different samples?

Thanks

rna-seq star rmats • 4.4k views
ADD COMMENTlink modified 11 months ago by Ömer An220 • written 2.8 years ago by bisht20diksha20

Is it mandatory that the read length should be same even when we are working with BAM files? (in case of rmats)

ADD REPLYlink written 21 months ago by iti.gupta10

That still seems to be the requirement - yes.

ADD REPLYlink written 21 months ago by Kevin Blighe67k

I've tried to run rMATs with two differents length with my BAM files (-readLength 100 and -readLength 130, my raw reads have 150bp but after trimming they drop around 130) , and I Have similar but note the same results from the 2 differents runs ..

So I don't really know what read length take, I don't want to restart the analyse and brutally trim the reads to have every reads with the same length, I found that this method generates too much information loss .. but maybe I'm wrong

ADD REPLYlink written 21 months ago by darbinator220

Better to contact the developer. I believe there is a Google Group page where she (developer) is more active.

ADD REPLYlink written 21 months ago by Kevin Blighe67k

is it possible to compare uneven number of replicates for test and control? ex 2 rep for test vs 3 rep for control...

ADD REPLYlink written 18 months ago by Ömer An220
2
gravatar for Kevin Blighe
2.8 years ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

Regarding the input BAM files, I would follow the program documentation. So, you should have 2 text files, with the follow contents:

b1.txt
1.bam,2.bam,3.bam

b2.txt
4.bam,5.bam,6.bam

Then, run the program with:

python rmats.py --b1 b1.txt --b2 b2.txt ...

---------------------------

Regarding the readLength command line parameter, rMATS requires that all of your reads are the same length. So, you will have to perform some read trimming to a specific length on your FASTQ / FASTA input files prior to alignment with STAR. For this, you can use Trimmomatic, Trim Galore!, or something else, such as the trimFastq.py script that comes with the program (see HERE for further information).

Note that there is also specific advice from the rMATS team for using STAR output:

Q: Can I run rMATS v4.0.1 (turbo) with STAR aligner output?

A: STAR aligner performs soft clipping by default which will generate variable read lengths. You can run STAR with "--alignEndsType EndToEnd" option to suppress soft clipping.

[source: http://rnaseq-mats.sourceforge.net/faq.html]

ADD COMMENTlink modified 18 months ago • written 2.8 years ago by Kevin Blighe67k

thanks. I have illumina paired end fastq data and I have trimmed it using trim galore. After running fastqc on trimmed data, I got sequence length of 20-51. Here, I have confusion about --read length parameter. What value should I put here?

ADD REPLYlink written 2.8 years ago by bisht20diksha20

Hello. All of your sequences must be the exact same length. You cannot have a range of values, like 20-51.

I would use the trimFastq.py Python script that comes with rMATS (prior to alignment) so that you have reads that are all 50bp. It is highly likely that many of your reads that are as low as 20bp are very low in frequency.

Then, when running rMATS, you would choose -readLength 50

Does that help?

ADD REPLYlink written 2.8 years ago by Kevin Blighe67k

It is obvious that after sequencing, the read length of all the reads would not be same and the strict option of equal readlength of all threads demands that there must be some trimming which will delete all the reads below a set limit. It definetly will make a huge impact on the outcome, since a large part of reads woud be of no use.

Also you mentioned trimFastq.py script, but there is no such script in the package.

Thanks

ADD REPLYlink written 2.8 years ago by bisht20diksha20
1

Yes, that is indeed very obvious. So, please take the complaint to the authors of the program. Regarding the missing trimFastq.py, again, that's a further complaint for the authors.

Good luck.

ADD REPLYlink written 2.8 years ago by Kevin Blighe67k

why aren't files in b2.txt comma separated?

ADD REPLYlink written 18 months ago by Ömer An220
1

They are now, Sire.

ADD REPLYlink written 18 months ago by Kevin Blighe67k

is it possible to compare uneven number of replicates for test and control? ex 2 rep for test vs 3 rep for control...

ADD REPLYlink written 18 months ago by Ömer An220
0
gravatar for Ömer An
11 months ago by
Ömer An220
Singapore
Ömer An220 wrote:

I can suggest you to try rMATS pipeline to analyse your RNA-Seq data using CSI NGS Portal.

You don't have to worry about read length this way as it is auto calculated.

ADD COMMENTlink written 11 months ago by Ömer An220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1749 users visited in the last hour