Question: how to get a new fastq file according to their barcodes?
0
gravatar for ruiyan_hou
2 days ago by
ruiyan_hou0
ruiyan_hou0 wrote:

hi, everybody, I have a question to ask. Hope to get your help and thank you.

I have a set scRNA-seq data (10×). It includes two reads. Reads 1 contain the UMI and barcode, just like follows:

@SRR7646180.1 1 length=26
GTCGTAAAGATATACGGCACAACTCT
+SRR7646180.1 1 length=26
CDDDDIIIIIHIIHIIIIIIIIIIII
@SRR7646180.2 2 length=26
GATCGTAGTTGCCTCTCAAAGAACGT
+SRR7646180.2 2 length=26
DDDDDIIIIIIHIIIIIIIIIIIIII
........

Reads 2 contain the sequence whose length is 98bp like follows:

@SRR7646180.1 1 length=98
CTAGGAAACTGGATATTCACATGTAGAAGACTGAAACTAGATGCTTATCTCTCACCACATTAAGAAAATCAAAATGGATT
+SRR7646180.1 1 length=98
CDDABIIHHHIIHIIHIIIIHIHIIIIIIIHHIIH?FHHIIIIIIIHIHHEHIIIIIIIIIIIIIIIIICHIIIIHIHII
@SRR7646180.2 2 length=98
AAGCAGTGGTATCAACGCAGAGTACATGGGGGTTCACTCCCACTTCATCCTGGCTGAAAGCAGTGCTGTGCTTTGAAATG
+SRR7646180.2 2 length=98
DDDDDIIGHIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIHIIIHIIIIHIIIIIIIIGH

Now I get a cellbarcode list like follows:

0       AAACCTGAGCCACTAT
1       AAACCTGAGTCTCCTC
2       AAACCTGCACAACTGT
3       AAACCTGTCGAGCCCA
4       AAACCTGTCTCCGGTT
5       AAACGGGAGAAGATTC
6       AAACGGGAGTGACTCT
7       AAACGGGCAAGGGTCA
8       AAACGGGCATGTAAGA
9       AAACGGGGTCAAAGAT

These cell barcodes originate some of read 1. They represent some of the cells. How can I get the reads including these barcodes in the fastq file?

Thank you in advance!

rna-seq • 50 views
ADD COMMENTlink written 2 days ago by ruiyan_hou0

May I ask why you want a cutom approach rather than simply running CellRanger or any orter specialized software for single-cell 10X data such as STARsolo, Salmon/Alevin or Kallisto/Bustools? What is your final goal? Barcodes can be noisy and with sequencing errors, naive approaches will likely be suboptimal here, the aforementioned software will take care of this.

ADD REPLYlink modified 2 days ago • written 2 days ago by ATpoint41k

Thank you. I have a scRNA-seq that contains three cell lines artificially mixing. I just want to get fastq of one of them. Now, I get these cell lines barcodes. I want to get the fastq file that just contains this kind of cell line. How should I do to get them? thank you!

ADD REPLYlink written 2 days ago by ruiyan_hou0

I see, maybe Split fastq according to barcodes as an inspiration?

ADD REPLYlink written 2 days ago by ATpoint41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2234 users visited in the last hour