Question: Bwa Run With Fastq File With Different Read Length
0
gravatar for Tinu
3.7 years ago by
Tinu150
New York
Tinu150 wrote:

I have 2 fastq files for the same sample, Sample1, but each with different read length(75bp and 100bp)

zcat Sample1_TTAGGC_L001_R1_001.fastq.gz | head -8 # few lines from first file

@HWI-D00330:52:H7TTTADXX:1:1101:1097:2122 1:N:0:TTAGGC
TACAACTGGTGAGTTTTTCCTCCAGCCTCCCTGTGACCCCTCACAACCCACCCCAGACAATGCTTTTCCTTCCCT
+
@@@DFFFFGDFHDFHIJJJEGIGHIGGIGGIIGHIIHGGIIJEHGEIGJIIGIIHIGIGEHECEFCB>>CD;@CC
@HWI-D00330:52:H7TTTADXX:1:1101:1161:2171 1:N:0:TTAGGC
GAGTCCATCTAGCCCAACCCAGACCAAGGGATTCACCTGAATATTCTCTTCTCACCTTTCATCATAGCTAAGATT
+
CCCFFFFFHHHHHJJFHHIJJHIJJJJJIIJJJJJJJJJHIGJJJIIJJJJJJJJJJJJJJJJJJJJJJIJEHHH

zcat Sample1_TTAGGC_L1234_R1_001.fastq.gz | head -8 # few lines from second file

@HWI-ST332R:292:D2GD6ACXX:1:1101:1123:2179 1:N:0:TTAGGC
TTTACTTGGCCTATTAGTGCCCTTCAGAAACTAATGACTCTTTCTTTGACCAAATTTACTCTCTTCATGTCCCTGTCCTTTGCCCCATATCCCCATTCCC
+
BCCDFFFDGHHHHHJIJFIIIEHJJGJJJJJJIJJIGIIJIJJGIJJCDDGIJJIJJGFHIIJJJIJJIIJJJJIJGIJJJHHHHGFFBEFEEDECCE>@
@HWI-ST332R:292:D2GD6ACXX:1:1101:2304:2176 1:N:0:TTAGGC
GANACTCTGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAATTAAGGAAGAGTAAAGGGAAAATAATTTTTTTAAAAGTGTCTTTAAATTTGGAAAGGTTG
+
CC#4ADDFHFFHFIGIIIGIIJJJJJJHDDDDDDDB################################################################**

Took few reads from these 2 files and with the resultant file gave a bwa run and it did run w/o any error. Wanted to know whether it is OK to do so, would it impact the output(.sai file) after alignment ?

zcat Sample1_TTAGGC_L001_R1_001.fastq.gz | head -8 > f1.fastq  
zcat Sample1_TTAGGC_L1234_R1_001.fastq.gz | head -8 > f2.fastq 

cat f1.fastq f2.fastq > Out.fastq
gzip Out.fastq 
bwa aln -t 4 hs37d5.fa Out.fastq.gz >Out.sai

Thanks, Tinu

fastq bwa • 2.7k views
ADD COMMENTlink modified 3.7 years ago by Devon Ryan70k • written 3.7 years ago by Tinu150
2

That's fine. Having reads of different lengths would be totally normal after trimming.

ADD REPLYlink written 3.7 years ago by Devon Ryan70k

Thank you for the answer

ADD REPLYlink written 3.7 years ago by Tinu150

It would be good for posterity if you could add this comment as an answer, then the OP could "accept" it.

ADD REPLYlink written 3.7 years ago by matted6.5k

Yeah, I should have done that after ashutoshmits comments, but was hoping someone else might produce a highly descriptive answer :oP I'll throw something a little longer-winded together real quick as an answer.

ADD REPLYlink written 3.7 years ago by Devon Ryan70k

I am closing this question as it has been answered.

ADD REPLYlink written 3.7 years ago by Ashutosh Pandey11k

?? Well... an answered question is not a reason to close it...

ADD REPLYlink written 3.7 years ago by Manu Prestat3.8k

re-opened. Questions should only be closed when they're unanswerable, duplicates of previous questions, or break the guidelines in some other way.

ADD REPLYlink written 3.7 years ago by Chris Miller18k

Thanks. Got it. There should be a way that trivial questions that have already been answered should not appear on top. People may spend time reading the question only to find out in the end that it was a simple one and has been solved. I am not blaming the simplicity of the question but just don't want people to spend time on reading something that is easy and has already been solved.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Ashutosh Pandey11k
1

Sure - If it's been asked and answered previously, then post a comment linking back to the old post that answers it, and close it up as a duplicate.

ADD REPLYlink written 3.7 years ago by Chris Miller18k
3
gravatar for Devon Ryan
3.7 years ago by
Devon Ryan70k
Freiburg, Germany
Devon Ryan70k wrote:

This is a perfectly fine thing to do and actually ends up not being that unusual.

Consider a standard experimental workflow. First, you receive raw reads all of the same length, but each showing varying amount (from complete to none) of adapter contamination and 3' quality decrease. The normal next step is to trim these adapter sequences off the ends of reads and, since you're already processing through the file, trim low quality regions from the ends as well (N.B., there's often no need to be very aggressive with quality trimming). The result of this is that many of your reads are now shorter than they were before. If you had paired-end reads to begin with, you'll often find that read #2 is shorter than read #1. This isn't a problem for bwa or any of the other standard aligners that I can think of.

As an aside, instead of doing this:

zcat Sample1_TTAGGC_L001_R1_001.fastq.gz | head -8 > f1.fastq  
zcat Sample1_TTAGGC_L1234_R1_001.fastq.gz | head -8 > f2.fastq 

cat f1.fastq f2.fastq > Out.fastq
gzip Out.fastq

You could simply do this:

cat Sample1_TTAGGC_L001_R1_001.fastq.gz Sample1_TTAGGC_L1234_R1_001.fastq.gz > Out.fastq.gz

The gzip format allows multiple records to be directly concatenated like that. This will work for most things (though not always with java, for reason that I've never looked into since it's easy to handle this with most APIs) and save you considerable time.

ADD COMMENTlink written 3.7 years ago by Devon Ryan70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1406 users visited in the last hour