Entering edit mode
7.5 years ago
Bioinfonext
▴
470
Dear All,
Somehow, We got different number of R1 and R2 reads in Illumina paired end sequencing data. Is there any way to extract equal number of R1 and R2 reads from these raw files. these are just pair end library not strand specific.
[root@psgl data_new]# grep -c '^@' SS_5W_R1.fastq
26623063
[root@psgl data_new]# grep -c '^@' SS_5W_R2.fastq
25803102
[root@psgl data_new]# grep -c '^@' SS_7W_R1.fastq
42474961
[root@psgl data_new]# grep -c '^@' SS_7W_R2.fastq
41089376
Thanks
If you want to use this method always include a few characters that follow @ sign (which are generally the machine serial) in line 1 (e.g.
grep -c "^@M1023" file_name
).