How to extract equal number of R1 and R2 read from raw Illumina data
2
0
Entering edit mode
6.8 years ago
Bioinfonext ▴ 460

Dear All,

Somehow, We got different number of R1 and R2 reads in Illumina paired end sequencing data. Is there any way to extract equal number of R1 and R2 reads from these raw files. these are just pair end library not strand specific.

[root@psgl data_new]# grep -c '^@'  SS_5W_R1.fastq

26623063

[root@psgl data_new]# grep -c '^@' SS_5W_R2.fastq

25803102

[root@psgl data_new]# grep -c '^@' SS_7W_R1.fastq

42474961

[root@psgl data_new]# grep -c '^@' SS_7W_R2.fastq

41089376

Thanks

RNA-Seq • 2.2k views
ADD COMMENT
1
Entering edit mode

If you want to use this method always include a few characters that follow @ sign (which are generally the machine serial) in line 1 (e.g. grep -c "^@M1023" file_name).

ADD REPLY
4
Entering edit mode
6.8 years ago

Quality score strings can contain or start with "@" so this is not a reliable method. Please use "wc" instead, or use an actual bioinformatics tool to count the reads and test formatting.

ADD COMMENT
0
Entering edit mode

Thanks a lot.

with this command, it is coming correct number:

awk '{s++}END{print s/4}' fastq file name

Thanks again!

ADD REPLY
0
Entering edit mode
6.8 years ago
st.ph.n ★ 2.7k
cat SS_7W_R1.fastq | echo ((`wc -l `/4))
ADD COMMENT

Login before adding your answer.

Traffic: 3279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6