Question: How To Generate A Paired End File Suitable For Bwa
0
gravatar for GPR
6.5 years ago by
GPR310
Mexico
GPR310 wrote:

A question about BWA and paired-end reads. In TopHat, one inputs the paired-end fastq file twice and the tool understands you have paired-end reads. I see that in the case of BWA one must align each pair separately. Can somebody give me a hint on how to generate these two matching files from my paired-end one? Thanks. G.

bwa • 3.3k views
ADD COMMENTlink modified 6.5 years ago by Ashutosh Pandey11k • written 6.5 years ago by GPR310

I don't understand your question, do you have one fastq-file with mixed pairs and want to split for BWA? or you don't know how to run BWA with 2 fastq PE files.

ADD REPLYlink written 6.5 years ago by JC7.7k

The first: have 1 paired-end fastq and want to split the pairs.

ADD REPLYlink written 6.5 years ago by GPR310

a simple script can help you, can you show us the input? (just to be sure of the format)

ADD REPLYlink written 6.5 years ago by JC7.7k

I have *.dat files.

ADD REPLYlink written 6.5 years ago by GPR310
1

*.dat? do you have fastq/fq or sam/bam? In fastq file mixed pairs can be in two (as far I know) formats:

1) each read is reported independely:

  @read_1
  ACATTCATTCATCTAT
  +
  BBBBBBBBBBBBBBBB
  @read_2
  TGCATGCAGCATGGCC
  +
  BBBBBBBBBBBBBBBB

2) both pairs are fusioned:

@read_12
ACATTCATTCATCTATTGCATGCAGCATGGCC
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

but 2) can be tricky depending if pairs are f-f or f-r

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by JC7.7k

I am not aware of this format. If are using unix can you do sth like "head -10 yourdatfile" and copy and paste the output here.

ADD REPLYlink written 6.5 years ago by Ashutosh Pandey11k

OK sorry, It's FASTQ right out of the Illumina instrument. I said *.dat, because that's how the end. They are as I say FASTQ.

ADD REPLYlink written 6.5 years ago by GPR310
1
As JC mentioned above if the files are sth like this:

@read_1
ACATTCATTCATCTAT
+
BBBBBBBBBBBBBBBB

@read_2
TGCATGCAGCATGGCC
+
BBBBBBBBBBBBBBBB

Then you can use grep function in unix. For example,

grep -A3 "* _1" Filewithmixedreads.txt  >  Filewithreadsfrom_1end.fastq should work. 

Similarly you should be able to get reads ending with _2 in another file. 
Hope this helps otherwise please print some lines from the file for us to see. 

Thanks.
ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Ashutosh Pandey11k

My reads actually look like this: + CCCFFFFFHHHHHIHGGHFIJJIIJJIJIEHHJJJFGJIJIIIIJJJIGCGHIIJHEHIJJB=?DFBCCBEEEEDA @HWI-ST974:67:C0545ACXX:2:1101:1628:32111 1:N:0: GTTGGAGCAGGCCCGCAAGGCCGAAGAGGTGCAGGCCTGGGCGCAGCGCAAGGAGCGGGAAGTGCTGCAGCTGCAG + @BCFFFFFHHHGHJIJJJJJIJJJJJGJJFHIGJJIJJJJIGGHFFEDDDDDDDDDDDDDBBCDDEDDDDDDDDD> @HWI-ST974:67:C0545ACXX:2:1101:1521:32121 1:N:0: GTGACTGTCGTGTCCTCGTCGACCTCCTTCTCCTGTCGCTCCAGATCCGCCTCAATCTCCTTGAGCTCTTCCAGCT Thanks!

ADD REPLYlink written 6.5 years ago by GPR310

that looks like a single-end reads or the first pair in a pair-end (1:N:0), are you sure that do you have mixed paired-ends? Actually, from your example, your first sequence map to MPRS26 (chr20:3027308-3027383), and the second to SPC24 (chr19:11258704-11258779) in hg19 without spanning.

ADD REPLYlink written 6.5 years ago by JC7.7k

I am pretty sure these are paired-end reads. Have actually analysed this data with TopHat-Cufflinks. So I guess in my data each read was reported independently and using grep is a good option. Thanks so much!

ADD REPLYlink written 6.5 years ago by GPR310
0
gravatar for Ashutosh Pandey
6.5 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

For BWA, you will have to provide all the reads (1 file containing all the forward or _1 reads from the paired ends) belonging to one end of the paired ends in the first step. You will have to redo the same step but this time you need to provide reads from the other end (1 file containing all the reverse or _2 reads from the paired ends). Both of these steps will produce .sai files (1 for each step OR two in total). These two files will be used by BWA sampe and 1 sam file will be produced.

1) bwa aln 1 file containing reads from one end > 1.sai file 2) bwa aln same step for the other end > 2 .sai file 3) bwa sampe 1.sai 2.sai > bam file

Hope this helps you.

ADD COMMENTlink written 6.5 years ago by Ashutosh Pandey11k

Thanks for the pointers. I actually understand this much. What I am trying to find out is how to split my paired-end file.

ADD REPLYlink written 6.5 years ago by GPR310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour