How to map polyA tails to barcodes in paired end reads for single cell RNA-seq
0
0
Entering edit mode
4 months ago
Ana ▴ 10

Hi all, I want to obtain polyA tail lengths for single cell RNA-seq data. I have paired-end reads, and I planned on counting the Ts on R1 and then finding the corresponding barcodes in R2 so I could integrate that info into my Seurat object. This is the code I am using to obtain a table with barcodes and polyA tail lengths:

awk 'NR%4==2'  sp_Hyun_S1_L001_R1_001.fastq.gz > r1_seqs.txt
awk '{ print gsub(/T/, "", $0) }' r1_seqs.txt > polyA_lengths.txt

paste whitelist_test.txt polyA_lengths.txt > barcode_polyA.tsv

But the file "barcodes_polyA.tsv" doesn't have the structure I expected (not every polyA tail length has a barcode associated). I appreciate any insights into what could be happening. Thanks!

RNA-seq cell polyA single • 570 views
ADD COMMENT
0
Entering edit mode

then finding the corresponding barcodes in R2

I don't think that is possible .. unless the inserts are short and R2 reads into the barcode at the other end crossing the poly-A tail. Here is the library structure of 10x libraries (in general) which should make it clear: https://cdn.10xgenomics.com/image/upload/v1660261286/support-documents/CG000108_AssayConfiguration_SC3v2.pdf

not every polyA tail length has a barcode associated

If there is no valid barcode associated with that R1 read, then that read pair may need to be discarded since that library fragment may not be a valid 10x library construct.

ADD REPLY

Login before adding your answer.

Traffic: 3987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6