1
0
Entering edit mode
22 months ago
Bedirhan • 0

Hi,

I am currently reading this paper (https://www.ncbi.nlm.nih.gov/pubmed/30214446 ), and are using the same protocol to build a bioinformatics pipeline to look at T cell clonality, I am quite unsure about how they were able to downsample the UMI reads.

I have used umi-tools to extract the umi information but unsure how to get around this step. My understanding is that they have achieved this downsampling on the fastq files not on mapped reads.

Any help or suggestions are appreciated.

Thank you

RNA-Seq UMI next-gen • 1.0k views
0
Entering edit mode

So what is the actual question? Do you need a tool for downsampling fastq?

0
Entering edit mode

Yes, I need to downsample fastq files based on UMI. I couldn't find any tools out there to do it.

1
Entering edit mode

I do not think they used a dedicated tool but simply counted how many reads were on average per UMi in the full dataset and then simply downsampled the total reads to somewhat match the expected number. Downsampling dataset with more than 60 million reads

0
Entering edit mode

Thank you for the explanation, I will try out seqtk mentioned in the link.

2
Entering edit mode
22 months ago
GenoMax 110k

reformat.sh from BBMap suite also has downsampling options.

Sampling parameters:

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
Important: srt/sbt flags should not be used with stdin, samplerate, qtrim, minlength, or minavgquality.
upsample=f              Allow srt/sbt to upsample (duplicate reads) when the target is greater than input.
prioritizelength=f      If true, calculate a length threshold to reach the target, and retain all reads of at least that length (must set srt or sbt).


I doubt there is a tool that can downsample taking into account the UMIs.

1
Entering edit mode

Agreed, it would have to be some custom script in conjunction with UMItools and / or BBMap. The giveaway word in the methods is 'about':

So, there's nothing exact about what they are doing.