how to get a new fastq file according to their barcodes?
1
0
Entering edit mode
3.5 years ago
ruiyan_hou • 0

hi, everybody, I have a question to ask. Hope to get your help and thank you.

I have a set scRNA-seq data (10×). It includes two reads. Reads 1 contain the UMI and barcode, just like follows:

@SRR7646180.1 1 length=26
GTCGTAAAGATATACGGCACAACTCT
+SRR7646180.1 1 length=26
CDDDDIIIIIHIIHIIIIIIIIIIII
@SRR7646180.2 2 length=26
GATCGTAGTTGCCTCTCAAAGAACGT
+SRR7646180.2 2 length=26
DDDDDIIIIIIHIIIIIIIIIIIIII
........

Reads 2 contain the sequence whose length is 98bp like follows:

@SRR7646180.1 1 length=98
CTAGGAAACTGGATATTCACATGTAGAAGACTGAAACTAGATGCTTATCTCTCACCACATTAAGAAAATCAAAATGGATT
+SRR7646180.1 1 length=98
CDDABIIHHHIIHIIHIIIIHIHIIIIIIIHHIIH?FHHIIIIIIIHIHHEHIIIIIIIIIIIIIIIIICHIIIIHIHII
@SRR7646180.2 2 length=98
AAGCAGTGGTATCAACGCAGAGTACATGGGGGTTCACTCCCACTTCATCCTGGCTGAAAGCAGTGCTGTGCTTTGAAATG
+SRR7646180.2 2 length=98
DDDDDIIGHIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIHIIIHIIIIHIIIIIIIIGH

Now I get a cellbarcode list like follows:

0       AAACCTGAGCCACTAT
1       AAACCTGAGTCTCCTC
2       AAACCTGCACAACTGT
3       AAACCTGTCGAGCCCA
4       AAACCTGTCTCCGGTT
5       AAACGGGAGAAGATTC
6       AAACGGGAGTGACTCT
7       AAACGGGCAAGGGTCA
8       AAACGGGCATGTAAGA
9       AAACGGGGTCAAAGAT

These cell barcodes originate some of read 1. They represent some of the cells. How can I get the reads including these barcodes in the fastq file?

Thank you in advance!

RNA-Seq • 1.9k views
ADD COMMENT
0
Entering edit mode

May I ask why you want a cutom approach rather than simply running CellRanger or any orter specialized software for single-cell 10X data such as STARsolo, Salmon/Alevin or Kallisto/Bustools? What is your final goal? Barcodes can be noisy and with sequencing errors, naive approaches will likely be suboptimal here, the aforementioned software will take care of this.

ADD REPLY
0
Entering edit mode

Thank you. I have a scRNA-seq that contains three cell lines artificially mixing. I just want to get fastq of one of them. Now, I get these cell lines barcodes. I want to get the fastq file that just contains this kind of cell line. How should I do to get them? thank you!

ADD REPLY
0
Entering edit mode
2.1 years ago
katze99 • 0

Hi, you can use --whitelist argument in UMI-Tools.

umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN --stdin R1.fastq.gz --stdout R1_extracted.fastq.gz --read2-in R2.fastq.gz --read2-out=R2_extracted.fastq.gz --whitelist=whitelist.txt

Remember your whitelist.txt file must be included your interest barcodes in tab-separated format, without any extra number or characters.

ADD COMMENT

Login before adding your answer.

Traffic: 2541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6