Identify the number of unique fragments from MiSeq data

1

Entering edit mode

7.8 years ago

arash.askary ▴ 10

I have a series of single-end reads from a RAD library of 48 uniquely tagged individuals in fastaq format. The data comes from a small MiSeq run. I want to know the number of unique fragments per individual/barcode, but I'm not sure how to go about getting that number. I'm new to bioinformatics, but I was able to use Stacks to demultiplex the library using the process_radtags function.

Could someone help? Thanks!

MiSeq stacks RAD • 2.0k views

ADD COMMENT • link updated 7.8 years ago by natasha.sernova ★ 4.0k • written 7.8 years ago by arash.askary ▴ 10

0

Entering edit mode

I thought stacks was a complete toolbox for RADseq analysis. So basically you are looking to deduplicate your demultiplexed datasets to get all unique sequences for each?

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

I'm sure there's a way to use stacks for my problem. I just want to know the number of unique fragments that are associated to each barcode. I'm not sure what you mean by deduplicating...

ADD REPLY • link 7.8 years ago by arash.askary ▴ 10

0

Entering edit mode

Following may give what you are looking for.

grep -A 1 "^@MACHINE_ID" your_file.fastq | grep -v "^@" | grep -v "\-\-" | sort | uniq -c

Replace MACHINE_ID with a few characters of the string (e.g. K00045) you see in your sequence files.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

Thanks! That seems to be exactly what I wanted. Just out of curiosity, is there an easy way of discerning fragments that are <95% identical in the same line of code? I've read the grep and uniq manual and can't seem to find a solution there.

ADD REPLY • link 7.8 years ago by arash.askary ▴ 10

0

Entering edit mode

For that you would need to use an aligner (e.g. blat or NGS aligner) and specify constraints by doing an all by all search.

ADD REPLY • link 7.8 years ago by GenoMax 141k

Login before adding your answer.