Question

How to map polyA tails to barcodes in paired end reads for single cell RNA-seq

0

Entering edit mode

4 months ago

Ana ▴ 10

Hi all, I want to obtain polyA tail lengths for single cell RNA-seq data. I have paired-end reads, and I planned on counting the Ts on R1 and then finding the corresponding barcodes in R2 so I could integrate that info into my Seurat object. This is the code I am using to obtain a table with barcodes and polyA tail lengths:

awk 'NR%4==2'  sp_Hyun_S1_L001_R1_001.fastq.gz > r1_seqs.txt
awk '{ print gsub(/T/, "", $0) }' r1_seqs.txt > polyA_lengths.txt

paste whitelist_test.txt polyA_lengths.txt > barcode_polyA.tsv

But the file "barcodes_polyA.tsv" doesn't have the structure I expected (not every polyA tail length has a barcode associated). I appreciate any insights into what could be happening. Thanks!

RNA-seq cell polyA single • 570 views

ADD COMMENT • link updated 4 months ago by GenoMax 153k • written 4 months ago by Ana ▴ 10

0

Entering edit mode

then finding the corresponding barcodes in R2

I don't think that is possible .. unless the inserts are short and R2 reads into the barcode at the other end crossing the poly-A tail. Here is the library structure of 10x libraries (in general) which should make it clear: https://cdn.10xgenomics.com/image/upload/v1660261286/support-documents/CG000108_AssayConfiguration_SC3v2.pdf

not every polyA tail length has a barcode associated

If there is no valid barcode associated with that R1 read, then that read pair may need to be discarded since that library fragment may not be a valid 10x library construct.

ADD REPLY • link 4 months ago by GenoMax 153k