Question: How to concatenate four fastq files of NextSeq in linux?
0
gravatar for harshraje19
27 days ago by
harshraje1920
harshraje1920 wrote:

Hi everyone, I am new in the field of bioinformatics. I have generated RNA Seq data, my sequencing was run on NextSeq with single end reads. I have 4 fastq files per sample and therefore need to combine these 4 files into one file. I have used the cat command in linux to combine these four files into one. But when, I am using the combined file for mapping it is throwing the error. (showing that file is not in proper fastq format for mapping). Does anyone has experience on combing these file and using them for mapping?. Or can I map these files separately.

Thank you.

ADD COMMENTlink modified 23 days ago • written 27 days ago by harshraje1920
1

Is this from a single run (I mean a single = 1 run, not single-end) or was it four different runs?

Please show the output of head -n 4 file.fastq for every file. Please also say how the files are labelled once you received them.

ADD REPLYlink modified 27 days ago • written 27 days ago by ATpoint29k

All four files are from single run.

following is the head -n5 of all four files

harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L001_R1.fastq 
@NS500468:254:HYMHJBGX2:1:11101:14046:1054 1:N:0:TTACCGAC
GGATCNTGGCAGCAAGGCCACTCTGCCACTTACAATACCCCGTCGCGTAATTAAGTCGTCGGCAAAGGATTCTAA
+
AAAAA#/EEEEEE6/AEEAEEEEE/EEAE<EEEE/EA/EE/EEEEE/A//EEE66</EA<///E/E/A//EA/E/
@NS500468:254:HYMHJBGX2:1:11101:3000:1055 1:N:0:TTACCGAC
harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L002_R1.fastq 
@NS500468:254:HYMHJBGX2:2:11101:10808:1046 1:N:0:TTACCGAC
CCCAANTCTGCATTGTTGATGCTTTTAGCACATGTAACTGCAGCATCATGAGTGTCAAAAGTGACAAAACCAAAA
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAEEE/EEEEEE
@NS500468:254:HYMHJBGX2:2:11101:18260:1048 1:N:0:TTACCGAC
harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L003_R1.fastq 
@NS500468:254:HYMHJBGX2:3:11401:17885:1021 1:N:0:TTACCGAC
NTTAATCGACCAACACCCTTTGTGGGTTCTAGGTTAGCGCGCAGTTGGGCACCGTAACCCGGCTTCCGGTTCCTC
+
#AAAAEEEEEAEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEAEEE//EE
@NS500468:254:HYMHJBGX2:3:11401:18904:1021 1:N:0:TTACCGAC
harshraj@harshraj-XPS-8930:~/Propriety_Carlos_GrapeVineRNASeq/AGRF_CAGRF15892_HYMHJBGX2_20170821$ head -5 P3_HYMHJBGX2_TTACCGAC_L004_R1.fastq 
@NS500468:254:HYMHJBGX2:4:11401:18851:1025 1:N:0:TTACCGAC
NAGACATTGATAGACAAGAAGGCTTGGCCATATGTCCAGATGGATCTCCGTCTCAAGGCAGAATATCTCCGGTAC
+
#AA/6EEEEEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE<EEEEE
ADD REPLYlink modified 25 days ago by genomax78k • written 25 days ago by harshraje1920
1

If this is paired end data then you need to cat the files in exactly the same order for both R1 and R2 reads.

ADD REPLYlink written 25 days ago by genomax78k

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink written 25 days ago by genomax78k

if you are a cat person, use cat in *nix and if you are a reptile person, use merge_fastq library.

ADD REPLYlink modified 23 days ago by RamRS25k • written 24 days ago by cpad011212k

dear friends thank you for your suggestions.

I have mapped all four lanes fast files to genome separately and them 4 sam file generated in this analysis were used together for ht-seq count. Then after ht-seq it generated four gene counts in one txt file and in excel performed gene count of lane 1 + gene count of lane 2 + gene count of lane 3 + gene count of lane 4 = total gene count.

Any thoughts on this.

Thank you everyone.

ADD REPLYlink written 23 days ago by harshraje1920

Please don't post unrelated questions as answers in the original thread.

ADD REPLYlink written 23 days ago by genomax78k

please read my question and response carefully

ADD REPLYlink written 23 days ago by harshraje1920
1
  1. Your new post is a question that is an off-shoot of your original question, and hence needs to be its own post. genomax is right in pointing that out
  2. You posted that as an answer, which is the wrong place to post anything that is not an answer to the top level question. genomax is right in pointing that out as well.

It looks like your question and response were indeed carefully read.

ADD REPLYlink written 23 days ago by RamRS25k

harshraje19 @ I am not sure if that is the way I would handle. Most of the aligners, for a given sample, allow the user to furnish multiple read files and use them in alignment to produce one single alignment file (SAM/BAM), followed by quantification per sample. But every one has his/her way to do the analysis.

ADD REPLYlink modified 23 days ago • written 23 days ago by cpad011212k

Thank you everyone for taking time to answer the question.

ADD REPLYlink written 23 days ago by harshraje1920

Please stop adding answers. This is the third time in this post that you're being asked to stop. If you continue adding answers, your account might be suspended for a period of time.

ADD REPLYlink modified 23 days ago • written 23 days ago by RamRS25k
1
gravatar for GokalpC
24 days ago by
GokalpC50
Turkey/Ankara/Intergen
GokalpC50 wrote:

It is better not to merge/cat fastq data from different lanes. Just start processing them as individual until duplicate marking stage or another where you can merge bam files with different read group names.

ADD COMMENTlink written 24 days ago by GokalpC50

Lane replicates show typically very little to no batch variation from what I have heard from people working in core facility settings who tested this on their machines. I think processing independently simply increases the workload, but indeed is safer, though unnecessary imho.

ADD REPLYlink written 23 days ago by ATpoint29k

I agree. I usually run lane-specific FASTQC + adapter trimming, then combine the FASTQs. Our sequencing facility splits samples across lanes, so that counters any lane-specific batch effect as well.

ADD REPLYlink modified 23 days ago • written 23 days ago by RamRS25k
0
gravatar for ATpoint
25 days ago by
ATpoint29k
Germany
ATpoint29k wrote:

So it seems these are simply different lanes, therefore please see:

How do concatenate different fasta file

ADD COMMENTlink written 25 days ago by ATpoint29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 802 users visited in the last hour