Cell Barcode Identification and Counting
1
3
Entering edit mode
8 months ago

Hi All,

I completed a single cell DNA barcoding experiment and have the .fastq file. The reads in the .fastq file are the 40 bp cell barcodes. Is there a way to count the frequency of the barcodes in the .fastq file de novo? I would like to determine how many different 40 bp barcodes are present in the population, and then count them.

Maybe this needs to be taken in 2 steps. The first is to know the bar code sequences that are present? They to use those sequences in a counting step?

Is there any advice on how to do this?

Best,
Joe

Cell-Barcoding • 644 views
ADD COMMENT
1
Entering edit mode

Try UMI-tools: Or UMI-tools https://umi-tools.readthedocs.io/en/latest/reference/whitelist.html

The difficulty is to decide which detected barcodes are real and which are just noise. Read through the docs, it explains this.

ADD REPLY
1
Entering edit mode
8 months ago

I just dealt with a similar problem with semi-random 30 bp barcodes. I found this recent review extremely helpful, as it was my first time dealing with random barcodes. Barcode synthesis is an imperfect process, especially for runs of homopolymers or dinucleotides - indels are extremely common. Barcode correction is necessary if you want accurate counts.

My approach to look at engraftment efficiency of a cell line in a xenograft mouse model was basically:

  • Take a look at the reads and trim primer/adapter sequences via cutadapt, leaving only the supposed barcode.
  • Correct barcodes & count with starcode.
  • For each sample, rank barcodes by count and consider the barcodes accounting for the first 90% of cumulative reads as "reasonably expressed". In practice, this results in most barcodes with very few counts being ignored:

enter image description here

Adjust as necessary for your application and experimental setup.

ADD COMMENT

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6