I have a question regarding my single-cell RNA-seq data. I have the following pair-end data in
Read1 (contains 6bp UMI, followed by 6bp cell barcode info and the rest is a polyT stretch):
@J00182:79:HV2WWBBXX:6:1101:11160:38873 1:N:0:ACAGTG GAGAAGACAGTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTT + AAAFFJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJ--A-----7AJJFAF-AJJJJJJJJJJJ<A<----A
Read2 is the normal read that I am using for mapping to the reference and the corresponding pair-end mate of the above read looks like this-
@J00182:79:HV2WWBBXX:6:1101:11160:38873 2:N:0:ACAGTG GCATACTTATTTCCAAACTTTTGGAAAAAGCATAATTTGACAAAAAAGAATACAATTTTTTGCTGTTTCAACCAC + A<<AFJFJJJJJJFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Now I would like to append the cell barcode and UMI info from the read1 sequence in front of the header of my read2 in the following format-
@6bpCellbarcode_6bpUMI#Read2header (with an underscore in between Cellbarcode and UMI and a hash between UMI and the rest of the header).
@ACAGTG_GAGAAG#J00182:79:HV2WWBBXX:6:1101:11160:38873 2:N:0:ACAGTG GCATACTTATTTCCAAACTTTTGGAAAAAGCATAATTTGACAAAAAAGAATACAATTTTTTGCTGTTTCAACCAC + A<<AFJFJJJJJJFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
ACAGTG is the cell barcode and
GAGAAG is the UMI. Note that the order is flipped here in the output as Read1 first contains UMI and later the cell barcode while the output I need is vice versa.
Can someone please tell me how to do that?
as usual, thank you so much!