how to keep common reads in paired end reads if the number of reads are not same in read1.fq and read2.fq
Entering edit mode
4.9 years ago
ak93sharma ▴ 10

hello folks, I am mapping reads with bowtie2 but it shows error as " fewer reads in the file specified with -2 than in file specified with -1 "

bowtie2 -x indexFile  -1 read1.fq   -2 read2.fq  -S result.sam

the number of reads are not anymore same in read1 and read2 after filtering of reads2, I want to keep common reads in both paired end reads so the number of reads is same in both, any help?

RNA-Seq bowtie2 perl sed awk • 2.8k views
Entering edit mode
4.9 years ago

How to extract paired reads from two paired-end reads file?

Firstly, extract sequence IDs of two file and compute the intersection:

$ gzip -d -c read_1.fq.gz read_2.fq.gz | seqkit seq --name --only-id | sort | uniq -d > id.txt

Then retrieve reads using id.txt:

$ gzip -d -c read_1.fq.gz | seqkit grep --pattern-file id.txt  | gzip -c > read_1.f.fq.gz
$ gzip -d -c read_2.fq.gz | seqkit grep --pattern-file id.txt  | gzip -c > read_2.f.fq.gz

Note that this example assumes that the IDs in the two reads file have same order. If not you can sort them after previous steps. Shell sort can sort large file using disk, so temporary directory is set as current directory by option -T ..

$ gzip -d -c read_1.f.fq.gz | seqkit fx2tab | sort -k1,1 -T . | seqkit tab2fx | gzip -c > read_1.f.sorted.fq.gz
$ gzip -d -c read_2.f.fq.gz | seqkit fx2tab | sort -k1,1 -T . | seqkit tab2fx | gzip -c > read_2.f.sorted.fq.gz
Entering edit mode
4.9 years ago

The question is more : what did you do you that created orphan reads ? Did you trim/quality filter your r1 and r2 reads independently ? Some trimming tools, like trimmomatic, have a paired-end mode to avoid those issues.

But if you really need to fix those files, you can try to remove the orphant reads.


Login before adding your answer.

Traffic: 3223 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6