Question: How to concatenate four fastq files of NextSeq in linux?
0
gravatar for harshraje19
8 months ago by
harshraje1930
harshraje1930 wrote:

Hi everyone, I am new in the field of bioinformatics. I have generated RNA Seq data, my sequencing was run on NextSeq with single end reads. I have 4 fastq files per sample and therefore need to combine these 4 files into one file. I have used the cat command in linux to combine these four files into one. But when, I am using the combined file for mapping it is throwing the error. (showing that file is not in proper fastq format for mapping). Does anyone has experience on combing these file and using them for mapping?. Or can I map these files separately.

Thank you.

ADD COMMENTlink modified 7 months ago • written 8 months ago by harshraje1930
1

Is this from a single run (I mean a single = 1 run, not single-end) or was it four different runs?

Please show the output of head -n 4 file.fastq for every file. Please also say how the files are labelled once you received them.

ADD REPLYlink modified 8 months ago • written 8 months ago by ATpoint38k

All four files are from single run.

following is the head -n5 of all four files

harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L001_R1.fastq 
@NS500468:254:HYMHJBGX2:1:11101:14046:1054 1:N:0:TTACCGAC
GGATCNTGGCAGCAAGGCCACTCTGCCACTTACAATACCCCGTCGCGTAATTAAGTCGTCGGCAAAGGATTCTAA
+
AAAAA#/EEEEEE6/AEEAEEEEE/EEAE<EEEE/EA/EE/EEEEE/A//EEE66</EA<///E/E/A//EA/E/
@NS500468:254:HYMHJBGX2:1:11101:3000:1055 1:N:0:TTACCGAC
harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L002_R1.fastq 
@NS500468:254:HYMHJBGX2:2:11101:10808:1046 1:N:0:TTACCGAC
CCCAANTCTGCATTGTTGATGCTTTTAGCACATGTAACTGCAGCATCATGAGTGTCAAAAGTGACAAAACCAAAA
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAEEE/EEEEEE
@NS500468:254:HYMHJBGX2:2:11101:18260:1048 1:N:0:TTACCGAC
harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L003_R1.fastq 
@NS500468:254:HYMHJBGX2:3:11401:17885:1021 1:N:0:TTACCGAC
NTTAATCGACCAACACCCTTTGTGGGTTCTAGGTTAGCGCGCAGTTGGGCACCGTAACCCGGCTTCCGGTTCCTC
+
#AAAAEEEEEAEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEAEEE//EE
@NS500468:254:HYMHJBGX2:3:11401:18904:1021 1:N:0:TTACCGAC
harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L004_R1.fastq 
@NS500468:254:HYMHJBGX2:4:11401:18851:1025 1:N:0:TTACCGAC
NAGACATTGATAGACAAGAAGGCTTGGCCATATGTCCAGATGGATCTCCGTCTCAAGGCAGAATATCTCCGGTAC
+
#AA/6EEEEEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE<EEEEE
ADD REPLYlink modified 8 months ago by genomax89k • written 8 months ago by harshraje1930
1

If this is paired end data then you need to cat the files in exactly the same order for both R1 and R2 reads.

ADD REPLYlink written 8 months ago by genomax89k

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink written 8 months ago by genomax89k

if you are a cat person, use cat in *nix and if you are a reptile person, use merge_fastq library.

ADD REPLYlink modified 7 months ago by RamRS30k • written 8 months ago by cpad011214k

dear friends thank you for your suggestions.

I have mapped all four lanes fast files to genome separately and them 4 sam file generated in this analysis were used together for ht-seq count. Then after ht-seq it generated four gene counts in one txt file and in excel performed gene count of lane 1 + gene count of lane 2 + gene count of lane 3 + gene count of lane 4 = total gene count.

Any thoughts on this.

Thank you everyone.

ADD REPLYlink written 7 months ago by harshraje1930

Please don't post unrelated questions as answers in the original thread.

ADD REPLYlink written 7 months ago by genomax89k

please read my question and response carefully

ADD REPLYlink written 7 months ago by harshraje1930
1
  1. Your new post is a question that is an off-shoot of your original question, and hence needs to be its own post. genomax is right in pointing that out
  2. You posted that as an answer, which is the wrong place to post anything that is not an answer to the top level question. genomax is right in pointing that out as well.

It looks like your question and response were indeed carefully read.

ADD REPLYlink written 7 months ago by RamRS30k

harshraje19 @ I am not sure if that is the way I would handle. Most of the aligners, for a given sample, allow the user to furnish multiple read files and use them in alignment to produce one single alignment file (SAM/BAM), followed by quantification per sample. But every one has his/her way to do the analysis.

ADD REPLYlink modified 7 months ago • written 7 months ago by cpad011214k

Thank you everyone for taking time to answer the question.

ADD REPLYlink written 7 months ago by harshraje1930

Please stop adding answers. This is the third time in this post that you're being asked to stop. If you continue adding answers, your account might be suspended for a period of time.

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS30k
1
gravatar for GokalpC
8 months ago by
GokalpC50
Turkey/Ankara/Intergen
GokalpC50 wrote:

It is better not to merge/cat fastq data from different lanes. Just start processing them as individual until duplicate marking stage or another where you can merge bam files with different read group names.

ADD COMMENTlink written 8 months ago by GokalpC50

Lane replicates show typically very little to no batch variation from what I have heard from people working in core facility settings who tested this on their machines. I think processing independently simply increases the workload, but indeed is safer, though unnecessary imho.

ADD REPLYlink written 7 months ago by ATpoint38k

I agree. I usually run lane-specific FASTQC + adapter trimming, then combine the FASTQs. Our sequencing facility splits samples across lanes, so that counters any lane-specific batch effect as well.

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS30k
0
gravatar for ATpoint
8 months ago by
ATpoint38k
Germany
ATpoint38k wrote:

So it seems these are simply different lanes, therefore please see:

How do concatenate different fasta file

ADD COMMENTlink written 8 months ago by ATpoint38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1174 users visited in the last hour