I have a single cell paired end data 140 bp long, contain 140 bp in both files _1 and _2 with no barcode and UMI information. How can I extract the barcodes and UMIs to use as an input for starsolo tool for alignment from the paired end single cell data and which file I should use for this extraction _1 and _2 and which should be given as raw reads to starsolo for alignment.
It seems like your single cell RNAseq data does not already contain the Cell Barcodes or Unique Molecular Identifiers in the standard way - they are not included in the read names or are not part of the sequences. Usually, for single cell RNAseq data, the CBs and UMIs are included as part of the reads (in the header or within the sequence) in the FASTQ files. Noramally you can find this in the R1 file (file _1) for 10X Genomics and other similar single cell RNAseq protocols. As to the actual transcript sequence, you can find it in the R2 file (file _2). One point to note is that if the data doesn't have this easily available there might have been an issue with the initial data processing. Eg STARsolo, (part of the STAR RNA-seq aligner) designed for single cell RNA-seq data expects that the CBs and UMIs are already included in the FASTQ files. I would check the data carefully again and if possible, seek assistance from the sequencing service to ensure the appropriate formatting of your single cell RNAseq data. It is critical that the single cell RNA-seq library preparation and sequencing process contains these elements in the correct format.