Question: Unrecognized Sequence in NGS reads
0
gravatar for jer364
12 weeks ago by
jer3640
jer3640 wrote:

Hi All,

I downloaded some SRA files from a paper that used single cell sequencing with unique barcodes for each cell. I took a closer look at some of the sequences that didn't map to a reference genome (the sample was a community standard so species were known) and hoped someone could help me interpret what I was looking at.

Example Sequence 1 : (Sorry if this isn't intuitive, I wanted to label known segments; 5' at top, read left to right and top to bottom, length=151)

Assuming this is genomic DNA 5' - GATTATGTCGCACTGTACCCGGAAAAATTAGCGGATATTAAG-

Nextera Adapter (unsure of the T) -T-CTGTCTCTTATACACATCTCCGA-

custom index sequence -GCCCACGAGACGTGTCGGGGCTGGCTTA-

barcode -TTAAACGGACCTAGA-

flow-cell adapter - CTATGCGGCATCAGAGCAGATTGTACTCGCTATTACGCCAGC - 3'

My understanding is that if the genomic fragment is small enough, sequencing may continue into the adapter, so no problems here. However, I don't know how to explain the next example sequence.

Example Sequence 2: (unknown sequence in parentheses at the end of the flow cell adapter)

partial Nextera Adapter (first nt should be a C) 5' - ATTATACACATCTCCGA-

custom index sequence - GCCCACGAGAGTGTCGGGCTGGCTTA-

barcode - TAGGGTCGCGGCCAG-

flow-cell adapter CTATGCGGCATCAGAGCAGATTGTACTCGCTATTACGCCAGCTGATCTCGTATGCCGTCTTCTGCTTG(ACCAAACATACTCTTTTCCTCTTCC) -3'

For the flow-cell adapter, the nts leading up to the portion in parentheses are complementary to the P7 adapter sequence for Illumina. If sequencing was carried out till the end, wouldn't the end be the last nts of the adapter, or are the nts in parentheses coming from a flow cell oligo? Also, how would sequencing begin upstream of the genomic DNA? Or am I completely misunderstanding something? Thank you so any help!

sequencing • 177 views
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by jer3640
1

If this is 10x data then Read1 consists of cell barcode and UMI only. That read does not contain any usable genomic sequence information. Read 2 contains the actual sequence. See this link for more.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by genomax92k

Hi @genomax,

These are unpaired reads. The only files are the R1 and Index files. So Read 1 contains what should be the genomic info, and Read 2 has the 15 bp barcode sequence.

ADD REPLYlink written 12 weeks ago by jer3640

Can you post the an example SRA# for the dataset that you are referring?

ADD REPLYlink written 12 weeks ago by genomax92k

Right now I'm working with a single sample, SRR5202186, to get a pipeline established.

ADD REPLYlink written 12 weeks ago by jer3640

Sorry, I forgot the index file wasn't included in the SRA upload. The files for that sample can be found on their GitHub https://github.com/AbateLab/SiC-seq linked in an issue posted by jessieren.

ADD REPLYlink written 12 weeks ago by jer3640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1727 users visited in the last hour