I'm recently analyze my scRNA-seq data, the first step is to splitting fastq files according to my barcode file which looks like this:
sc1 AACGTGAT sc2 AAACATCG sc3 ATGCCTAA sc4 AGTGGTCA sc5 ACCACTGT sc6 ACATTGGC sc7 CAGATCTG sc8 CATCAAGT sc9 CGCTGATC sc10 ACAAGCTA sc11 CTGTAGCC sc12 AACGCTTA
My data is pair end sequenced and the R1, R2 are like these (I trimmed some):
@ST-E00493:75:H33JKALXX:1:1101:10987:2206 2:N:0:ATACACAT AACGCTTAAGGGTAATTTTTTGTGTTATGTATTTTTTTTTTAGGGGAAAAGGCATTTTTGGT... + AAFFFFJJ<A7JF<JF----AA--A--7----AAFJ-F<-FF-<<F-<-AFFA-7A7A-A-<...
@ST-E00493:75:H33JKALXX:1:1101:10987:2206 1:N:0:ATACACAT GTTGTGAAGGGGAGGCTGGAGAGGCTTCGTCTGCTAAGAGCATTGGCCGTTCTTCCACTGTT... + AAAFFFJ-<JJJJJJJJFJJJF7JFFJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJFFJJJ...
The barcode information is in the first 8bp of R2 (Here is AACGCTTA), so, I want to split the fastq file according to the barcode informations and pair the read_1 to read_2 by header info. But after I searched many programes or scripts I can't find a suitable solution:
fastq-multx -B barcode_sequence -b -m 0 R2.fastq.gz R1.fastq.gz -o %_R1.fq -o %_R2.fq
The result is absolutely not what I want which only 7 lines head with its barcode.
fastx_barcode_splitter.pl It seems don't spport PE reads.
I also wrote a python script, but it runs so slow.... , So, I wonder if somebody have good suggestions. Thanks in advance!