Question

Does read length of RNA seq affects the results ?

1

Entering edit mode

3.3 years ago

chaudharyc61 ▴ 90

Hello everyone

As my question in my title says "Does read length of RNA seq affects the results ?" So I ahve a wild type of 75 BPs paired end data and mutant is of 150 BPs paired end.

After mapping does that affects the DEGs ?

Thank you Chandan kumar

RNA-Seq next-gen DESEq2 • 1.0k views

ADD COMMENT • link updated 3.3 years ago by ponganta ▴ 590 • written 3.3 years ago by chaudharyc61 ▴ 90

0

Entering edit mode

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0697-y

As noted by @ATPoint you should use comparable lengths in a single analysis at starting point.

ADD REPLY • link 3.3 years ago by GenoMax 141k

1

Entering edit mode

@OP, this is actually a general principle. If you compare groups in a statistical framework you must make sure the only difference between them is the biological effect you want to test. Everything else that is specific for group would be a confounder.

ADD REPLY • link 3.3 years ago by ATpoint 81k

score 1 · Answer 1 · 2021-01-13

I would anticipate that impact would be minor on the global scale but individual genes might be affected. Longer reads improve alignment. False alignments could be reduced since longer reads are more unique. In order to avoid mappability bias I would probabl trim all data to a constant length, for example with seqtk, and then remap.

The fact that both groups differ in sequencing implies that they might have been produced at different timepoints, is that the case? If so the experiment would be confounded, hopefully the confounding effect does not mask any meaningful biological effects. Can you elaborate?

score 0 · Answer 2 · 2021-01-13

0

Entering edit mode

3.3 years ago

ponganta ▴ 590

What kind of analyses do you want to conduct? How do you quantify (mapping or quasi mapping?), what kind of reference do you utilise? Do you want to compare WT and mutant under certain conditions?

To add to @ATpoint and @GenoMax, if you want to find DEGs between WT and mutant, you might see a pretty hefty batch effect. Make sure to investigate those effects prior to DGE-analyses via clustering and PCA of samples.

ADD COMMENT • link 3.3 years ago by ponganta ▴ 590

1

Entering edit mode

The OP states that the read length is entirely confounded with biological condition. Thus, you won't be able to see this as a batch effect on a PCA.

ADD REPLY • link 3.3 years ago by i.sudbery 19k

0

Entering edit mode

Unfortunately, the OP also states that both libraries were constructed in different experiments, hence the likely batch effect I mentioned. Sorry for my imprecise wording! Maybe comBat will be of use here? But to @chaudharyc61: I doubt that you can succesfully conduct DGE-analyses in this situation. Look out for batch effects using a PCA. If you find that PC1 explains most of the variation and clearly seperates WT and mutant in two, this will be indicative of a batch effect due to different experiments (i.e. different libraries made by different people at different times with different technology) being compared.

ADD REPLY • link 3.3 years ago by ponganta ▴ 590

1

Entering edit mode

If group is confounded by batch you cannot correct it. If groups separate then this can be due to biology or batch, or both. No way to tell.

ADD REPLY • link 3.3 years ago by ATpoint 81k

0

Entering edit mode

I concur. When group is 100% confounded it is mathematically impossible to correct it, irrespective of how fancy the tool you use is.

ADD REPLY • link 3.3 years ago by i.sudbery 19k