Question

Merging fastq files from two experiments

0

Entering edit mode

5.5 years ago

zizigolu ★ 4.3k

Hi,

I have FASTQ files from 2 separates RNA-seq experiments but from same patients. In these different experiments, in one of them 2545 probes and in another one 1402 probes been sequenced. I do have 719 common probes between them. I want to merge these experiments. For each well I have 4 lanes (?? because I have 4 fastq files for each well). How I can merge fastq files for each well from both experiments? For example for well 1, experiments 1 I have

OBP1_L001-ds.073409c051ac418f83e3e0d75c70fdfc

OBP1_L002-ds.3538648090c14d5bbf34699ee903e3ac

OBP1_L003-ds.ef0dc5bbc14346c3b356dfedb0dad288

OBP1_L004-ds.1ac734677f4b41e793990156cf1c44a7

And for well 1 , experiment 2 I have

IOP1_L001-ds.5fe08d0acbbc4f50a47a13ec2c54102b

IOP1_L002-ds.529be7c7e6b947cfa79ed8ab9c573f17

IOP1_L003-ds.f93ad5e9a3b7457cb8611b4caf16c05e

IOP1_L004-ds.5ea5f00c987144a3b2851ba91becce4d

FASTQ NGS • 2.9k views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Hello F!

Questions similar to yours can already be found at:

Merging two fastq files

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLY • link 5.5 years ago by ATpoint 84k

0

Entering edit mode

Sorry, but these are from different experiments in one of them 2545 probes and in another one 1402 probes been sequenced

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

I reopened the question. Still, the toplevel question does not contain any information about probes whatsoever. Please edit it and provide sufficient details. Brief Reminder On How To Ask A Good Question

ADD REPLY • link 5.5 years ago by ATpoint 84k

0

Entering edit mode

Does all that about probes actually matter if your question is about fastq merging only?

By the way I still think you cannot just merge the data off these two capture platforms.

ADD REPLY • link 5.5 years ago by WouterDeCoster 47k

0

Entering edit mode

@b.nota believes I must merge data from the scratch (FASTQ files).

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k

1

Entering edit mode

I never said to merge fastq files from technical replicates from different batches. In your previous question you said you wanted to average read counts, which I advised not to do. You said you had different gene annotations from both batches, so I recommended to do the alignment and feature counts with the same set of genes. It became clear also that you were not using normal RNA-seq (as your tag was claiming). I have never used HTG, so my advise was on normal RNA-seq. I never advised to merge fastq files from different batches though. I think you get better help from us if you describe in more detail what you have and what you need (and why).

ADD REPLY • link 5.5 years ago by Benn 8.3k

0

Entering edit mode

Sorry, that all comes from my too narrow information, I interpret things in a way that is unrealistic

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

If I understand correctly, you use a method with probes, so you only target a subset of the transcriptome. You say that 2 different experiments with different probe sets, contain overlapping samples, and only ~700 probes are in common. My question is why do you want to merge these? Or why average read counts of these overlapping samples. What do you want to gain with this? It seems not to be logic to merge or average such technical replicates. Why are you interested in these technical replicates?

ADD REPLY • link 5.5 years ago by Benn 8.3k

0

Entering edit mode

These are two separate panels correct? I am not sure this is the way to do things in that case.

ADD REPLY • link 5.5 years ago by GenoMax 144k

0

Entering edit mode

Yes they are different panels, so what would be the option please?

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

What exactly are you trying to do and are you certain it is logical? I know you have been working on this for a few days but I may have missed that point, in case it was mentioned before.

With HTG data I have always seen only one file per sample but then we never run HTG pools in more than one lane. Is that what has been done here? Same pool run on multiple lanes?

ADD REPLY • link 5.5 years ago by GenoMax 144k

0

Entering edit mode

I guess yes, I have 96 samples but for each sample I have 4 fastq files.

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Then it should be fine to cat the L001-L004 files for each sample together for one specific panel. If there is some HTG specific nuance in having them run like this I am not aware of it.

I am still not certain what you are doing with data from 2 distinct panels.

ADD REPLY • link 5.5 years ago by GenoMax 144k

0

Entering edit mode

In Illumina saying we have 4 lanes for each sample maybe that means I have data from NextSeq sequencer

enter image description here

Actually they think I must merge genes from both panels to have more genes and more power to detect differentially expressed genes because both panels come from same patients.

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

👍 for NextSeq part.

As for the merging of things from two different panels what you say sounds reasonable but you may be in uncharted territory here. Do you have a link o HTG's website that says it is possible to use this data for DE analysis?

ADD REPLY • link 5.5 years ago by GenoMax 144k

0

Entering edit mode

I am not sure about differential expression, but people in HTG experiments do differential expression by t-test if they have matched samples or ANOVA with unmatched samples. I saw for HTG for whole transcripton (miRNA) people even use DESeq2. Even I saw people use edgeR for non whole transcriptom. I know these 96 samples are from same patients (one patient for both panels) and we have 719 common probs between panels but the rest are non-common. Yesterday I used cat as @genomax suggested to merge the lanes but HTG parser did not recognise the merged FASQ files. This week HTG producers come to unerversity for a meeting I have to present my results and my boss asked me to tell them what I think about this assay. I am not sure what I should ask though

ADD REPLY • link 5.5 years ago by zizigolu ★ 4.3k