I want to use AMOScmp to analyze illumina paired end data. AMOScmp requires the same number of paired file to build .afg file. The original fq files are paired. After I pass fq files separately through quality, duplicated sequences, and human DNA control, I find out that the paired end fa files have different number of reads. I want to remove unpaired reads from paired end reads to get two fa files with the same number of reads. Does anybody have script or know what software to help me to solve the problem?
Check this post removal of unpaired reads, especially comment #8 for the script. If you wanna separate the paired end fastq files, in single end reads, galaxy has a tool called pairedendsplitter and for the source code, check the archive.
Also, from the user page of Trimmomatic, which is another fastQ utility suite,
For single-ended data, one input and one output file are specified, plus the processing steps. For paired-end data, two input files are specified, and 4 output files, 2 for the 'paired' output where both reads survived the processing, and 2 for corresponding 'unpaired' output where a read survived, but the partner read did not.
EDIT: Sukhdeep beat me to it! I think he's referring you to the same tool.
Peter Cock wrote some tools just for this purpose that I bet would be very helpful.
I think paired-end interlacer followed by de-interlacer is what you want. From there if you want FA files, you can just do the simple conversion. Let me know if I'm not understanding your query correctly.
You can use http://code.google.com/p/ea-utils/wiki/FastqMcf for the trimming. This tool makes sure that reads belonging to the same pair will be removed in case one of the read can't pass the test or shortened down so that it can't pass the minimum length criteria. The trimming speed is also quiet high.