Question

How to select the reads with a certain adapter in a fastq file in R

0

Entering edit mode

3.1 years ago

Youyy • 0

I am currently doing a paired-end SMART-seq(an low input method of RNA-seq) data. I am interested in selecting the reads with SMART-Seq unique adapter ( 5' end adapter).

I am trying to do the job in R. I can use the Bioconductor tool, ShortRead to read the fastq file, then I can extract the read sequences and convert it to a vector. Then I can detect the reads with the adapter and pick them up by stringr. But definitely, I can't convert the vector back to the fastq file and to do any downstream analysis.

Anyone knows how to select reads with a certain adapter in a fastq file? is there any R tool can achieve this job? Thank you.

SmartSeq R Bioconductor Shortre RNAseq • 1.3k views

ADD COMMENT • link updated 3.1 years ago by ATpoint 82k • written 3.1 years ago by Youyy • 0

score 0 · Answer 1 · 2021-04-03

0

Entering edit mode

3.1 years ago

ATpoint 82k

You posted already some questions about this, and I can tell you R is definitely the wrong choice for this. Why don't you simply do what all people do: Run the files through fastqc, that will tell you if and which adapter is overrepresented. If there is contamination then trim it with Cutadapt which you already seem to be using based on your previous question. There is always only one type of adapter, it cannot be that there is both a SMARTseq and an Illumina Universal Adapter contamination, this is not how these libraries were prepared. Trim then if necessary, and then proceed with downstream analysis. There is no usecase that I can imagine (please proof me wrong if necessary) where you would need to load fastq reads into R, just trim stuff with standard tools and then align them, don't overthink this.

ADD COMMENT • link 3.1 years ago by ATpoint 82k

0

Entering edit mode

Sorry. My PI is interested in how many reads with SMART-seq unique adapters, and he only wants the reads with the adapter for downstream analysis. That's why I posted this question.

ADD REPLY • link 3.1 years ago by Youyy • 0

1

Entering edit mode

Probably you want bbduk https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/ which has an option to output the reads that contain certain adapters (kmers). Still, I would really think about whether this analysis makes any sense, as it is stochastic which reads contain adapters and which don't based on the fragment length of the cDNA, this is not a directed process.

ADD REPLY • link 3.1 years ago by ATpoint 82k

0

Entering edit mode

Thank you, I will read the link you sent to me. I don't understand well and do not have much experience, I have to follow my PI's instruction.

ADD REPLY • link 3.1 years ago by Youyy • 0