Bwa Run With Fastq File With Different Read Length
1
0
Entering edit mode
10.3 years ago
ttom ▴ 220

I have 2 fastq files for the same sample, Sample1, but each with different read length(75bp and 100bp)

zcat Sample1_TTAGGC_L001_R1_001.fastq.gz | head -8 # few lines from first file

@HWI-D00330:52:H7TTTADXX:1:1101:1097:2122 1:N:0:TTAGGC
TACAACTGGTGAGTTTTTCCTCCAGCCTCCCTGTGACCCCTCACAACCCACCCCAGACAATGCTTTTCCTTCCCT
+
@@@DFFFFGDFHDFHIJJJEGIGHIGGIGGIIGHIIHGGIIJEHGEIGJIIGIIHIGIGEHECEFCB>>CD;@CC
@HWI-D00330:52:H7TTTADXX:1:1101:1161:2171 1:N:0:TTAGGC
GAGTCCATCTAGCCCAACCCAGACCAAGGGATTCACCTGAATATTCTCTTCTCACCTTTCATCATAGCTAAGATT
+
CCCFFFFFHHHHHJJFHHIJJHIJJJJJIIJJJJJJJJJHIGJJJIIJJJJJJJJJJJJJJJJJJJJJJIJEHHH

zcat Sample1_TTAGGC_L1234_R1_001.fastq.gz | head -8 # few lines from second file

@HWI-ST332R:292:D2GD6ACXX:1:1101:1123:2179 1:N:0:TTAGGC
TTTACTTGGCCTATTAGTGCCCTTCAGAAACTAATGACTCTTTCTTTGACCAAATTTACTCTCTTCATGTCCCTGTCCTTTGCCCCATATCCCCATTCCC
+
BCCDFFFDGHHHHHJIJFIIIEHJJGJJJJJJIJJIGIIJIJJGIJJCDDGIJJIJJGFHIIJJJIJJIIJJJJIJGIJJJHHHHGFFBEFEEDECCE>@
@HWI-ST332R:292:D2GD6ACXX:1:1101:2304:2176 1:N:0:TTAGGC
GANACTCTGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAATTAAGGAAGAGTAAAGGGAAAATAATTTTTTTAAAAGTGTCTTTAAATTTGGAAAGGTTG
+
CC#4ADDFHFFHFIGIIIGIIJJJJJJHDDDDDDDB################################################################**

Took few reads from these 2 files and with the resultant file gave a bwa run and it did run w/o any error. Wanted to know whether it is OK to do so, would it impact the output(.sai file) after alignment ?

zcat Sample1_TTAGGC_L001_R1_001.fastq.gz | head -8 > f1.fastq  
zcat Sample1_TTAGGC_L1234_R1_001.fastq.gz | head -8 > f2.fastq 

cat f1.fastq f2.fastq > Out.fastq
gzip Out.fastq 
bwa aln -t 4 hs37d5.fa Out.fastq.gz >Out.sai

Thanks, Tinu

bwa fastq • 5.6k views
ADD COMMENT
2
Entering edit mode

That's fine. Having reads of different lengths would be totally normal after trimming.

ADD REPLY
0
Entering edit mode

Thank you for the answer

ADD REPLY
0
Entering edit mode

It would be good for posterity if you could add this comment as an answer, then the OP could "accept" it.

ADD REPLY
0
Entering edit mode

Yeah, I should have done that after ashutoshmits comments, but was hoping someone else might produce a highly descriptive answer :oP I'll throw something a little longer-winded together real quick as an answer.

ADD REPLY
0
Entering edit mode

I am closing this question as it has been answered.

ADD REPLY
0
Entering edit mode

?? Well... an answered question is not a reason to close it...

ADD REPLY
0
Entering edit mode

re-opened. Questions should only be closed when they're unanswerable, duplicates of previous questions, or break the guidelines in some other way.

ADD REPLY
0
Entering edit mode

Thanks. Got it. There should be a way that trivial questions that have already been answered should not appear on top. People may spend time reading the question only to find out in the end that it was a simple one and has been solved. I am not blaming the simplicity of the question but just don't want people to spend time on reading something that is easy and has already been solved.

ADD REPLY
1
Entering edit mode

Sure - If it's been asked and answered previously, then post a comment linking back to the old post that answers it, and close it up as a duplicate.

ADD REPLY
3
Entering edit mode
10.3 years ago

This is a perfectly fine thing to do and actually ends up not being that unusual.

Consider a standard experimental workflow. First, you receive raw reads all of the same length, but each showing varying amount (from complete to none) of adapter contamination and 3' quality decrease. The normal next step is to trim these adapter sequences off the ends of reads and, since you're already processing through the file, trim low quality regions from the ends as well (N.B., there's often no need to be very aggressive with quality trimming). The result of this is that many of your reads are now shorter than they were before. If you had paired-end reads to begin with, you'll often find that read #2 is shorter than read #1. This isn't a problem for bwa or any of the other standard aligners that I can think of.

As an aside, instead of doing this:

zcat Sample1_TTAGGC_L001_R1_001.fastq.gz | head -8 > f1.fastq  
zcat Sample1_TTAGGC_L1234_R1_001.fastq.gz | head -8 > f2.fastq 

cat f1.fastq f2.fastq > Out.fastq
gzip Out.fastq

You could simply do this:

cat Sample1_TTAGGC_L001_R1_001.fastq.gz Sample1_TTAGGC_L1234_R1_001.fastq.gz > Out.fastq.gz

The gzip format allows multiple records to be directly concatenated like that. This will work for most things (though not always with java, for reason that I've never looked into since it's easy to handle this with most APIs) and save you considerable time.

ADD COMMENT

Login before adding your answer.

Traffic: 1911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6