Inline barcodes in the reverse reads
0
0
Entering edit mode
6.4 years ago
Picasa ▴ 630

Hi,

I have a sample of PE reads that I want to demultiplex. For this I used fastq-multx.

So for instance, my barcode is XXXX

And my Forward raw reads : XXXXCCTTGGGCATGATGGTGACGCGCTTGGCGTGGATGGCGCACAGGTTGGTGTCCTCGAACAGGCCGACCAGGTAGGCCTCGCTGGCCTCCTGCAG

After fastq-multx, this read has been correctly assigned and trimmed:

CCTTGGGCATGATGGTGACGCGCTTGGCGTGGATGGCGCACAGGTTGGTGTCCTCGAACAGGCCGACCAGGTAGGCCTCGCTGGCCTCCTGCAG

However, my Reverse read can be different.. Either I saw:

• No barcode in the reverse read
• Barcode (reverse complemented) in the 5' part: XXXXATGGCTCGTACCAAGCAGACCGCCCGCAAGT
• Barcode (reverse complemented) within the R reads: ATGGCTCGTACCAAGCAGACCXXXXCGGAGGCAAGGCTCCCCGC

I'm not sure what I have to do with. Should I keep only the PE reads with the one that don't have barcode in the reverse reads ?

barcodes • 2.5k views
0
Entering edit mode

How was the data generated? What is the cause that the barcode can end up everywhere (or not) in the reverse read?

0
Entering edit mode

It's an amplicon sequencing with custom barcodes.

I don't know how the barcode can be found in the Reverse read.

Just a precision that I forgot to mention: the barcode in the reverse read is the reverse complemented of XXXX

What is the "normal" process ? should the barcode be only found in the forward read ?

0
Entering edit mode

That depends on the library prep. How was the library created? When/how were the barcodes attached? Without proper understanding of the experimental procedure we can't get this right.

I assume this is about the same data as in Confusion about barcodes and removal

0
Entering edit mode

Yes this is the same dataset.

The procotol is based on:

https://www.ncbi.nlm.nih.gov/pubmed/20516186

1
Entering edit mode

In that protocol I found the following (page4, figure 1):

Ligation is nondirectional and also produces molecules which have the same adapters attached to both ends (not depicted). Such molecules do not interfere with sequencing and—due to the formation of hairpin structures—amplify very poorly during indexing PCR.

So that explains why you have some fragments with barcodes on both sides. Essentially you should only have a barcode on one end. Question now is how frequent you saw the barcode in the reverse read.

Based on your explanation your barcode is only 4 characters long, so that means it can also be present by chance in the read, therefore you need to look for its expected context: the illumina P7 sequence.

0
Entering edit mode

The XXXX was just an example to simplify. In fact, the length is 7pb.

So if I grep the reverse complement of the barcode in the Reverse read, I find 75695/118664 which correspond to 64%.

Maybe should I keep the PE with

• No barcode in the reverse read
• Barcode (reverse complemented) in the 5' part

• Barcode (reverse complemented) within the R reads:

?

0
Entering edit mode

Are the barcodes at the beginning of the read in your grep (if that is where they are supposed to be)? As @Wouter already said you should find the barcode only one time but it can be at either end.

0
Entering edit mode

So there is 39066/118664 (33%) reverse reads that have the reverse complemented barcode in it's 5'.

And so 36629/118664 (31%) reverse reads that have the reverse complemented barcode somewhere in the read.

So if I understand, I should discard all the PE that have the reverse complemented barcode (at the beginning or middle) in it's reverse reads ?

0
Entering edit mode

The adapters are ligated using blunt end ligation and as such it's not impossible that fragments end up with two barcodes. However, if I'm not mistaken these shouldn't get sequenced since they contain the same adapter on both sides and therefore won't get amplified by bridge amplification. The barcode should always be at the P7 side of the amplicon so I would suggest OP to look for that sequence.

0
Entering edit mode

I just noticed the protocol you shared doesn't use inline barcodes.

0
Entering edit mode

It was based on that paper but has been modified lightly.

1
Entering edit mode

Then you might want to @#\$%'ing consider telling us what you modified instead of having us take guesses to what you have been doing. Really, provide this information upfront because this is a waste of time. The past hour this thread has only been about the experimental procedure and we haven't started yet on the barcode processing. You made us look through protocols and now we have to find out that you modified the protocol - on a vital point apparently. This topic and the previous is quite a pain in the elbow to get a good understanding of what your question really is about.

1
Entering edit mode

If it is using inline barcodes then that is not a light modification.

I think you have enough information already to find the right solution.

0
Entering edit mode

For future reference, please do not post links to sites behind a paywall - not everyone has access. It's better to copy/paste the relevant information in your post.

0
Entering edit mode

For those who don't have access here is a dropbox link https://www.dropbox.com/s/v3xrola70fhzwyk/meyer2010.pdf?dl=0 I ehm perfectly ahum legal obtained that file cough and share this totally anonymously.

0
Entering edit mode

And you expect us to click on a dropbox link for a file that is anonymously shared :)

0
Entering edit mode

I do not necessarily "expect" that, I provide the opportunity. It's up to you to gamble whether it will be save or not ;) And there is always http://sci-hub.cc/ for those who want to obtain the paper the same way.

1
Entering edit mode

Thanks @WouterDeCoster, but I know how to access the reference. I was trying to encourage better behavior by the OP.