Question: Visualize nucleotides for every position in R
2
gravatar for marija
9 months ago by
marija30
Croatia
marija30 wrote:

Hi, please, how can I visualize nucelotides for every position. I have a percentage of bases but now I don't know how to create a plot.

fastq <- readDNAStringSet("https://d28rh4a8wq0iu5.cloudfront.net/ads1/data/ERR037900_1.first1000.fastq","fastq")
fastq
freq <- alphabetFrequency(fastq, as.prob = T,baseOnly=T)

I would like something like this (sorry for quality):

R • 492 views
ADD COMMENTlink modified 9 months ago by cpad01129.1k • written 9 months ago by marija30
1

or use fastqc https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ?

ADD REPLYlink written 9 months ago by Pierre Lindenbaum112k
4
gravatar for cpad0112
9 months ago by
cpad01129.1k
India
cpad01129.1k wrote:

You need to change your code. It is not alphabet frequency. It is consensus matrix. Example code and image below:

library(Biostrings)
fastq <- readDNAStringSet("sample.fastq","fastq")
## At each position, base frequency
afmc=consensusMatrix(fastq, baseOnly=T,as.prob = T)
tafmc=t(afmc)
matplot(tafmc[,-5], type="l", lwd=2, xlab="Read Length", ylab= "Base frequency at each position")
legend(legend = colnames(tafmc)[-5],"topright",col=1:4, lty=1:4, lwd=2)

test

5th column is others (other than A,T, G and C).

ADD COMMENTlink modified 9 months ago • written 9 months ago by cpad01129.1k

Awesome, thank you very much.

ADD REPLYlink written 9 months ago by marija30

much simpler is using SeqTools in R:

library(seqTools)
# Reads fastq file
fq=fastqq("sample.fastq")
# Plots nucleotide frequency
plotNucFreq(fq,1)

Rplot

ADD REPLYlink modified 9 months ago • written 9 months ago by cpad01129.1k
2
gravatar for Sean Davis
9 months ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:
library(Biostrings)
fastq <- readDNAStringSet("https://d28rh4a8wq0iu5.cloudfront.net/ads1/data/ERR037900_1.first1000.fastq","fastq")
freq <- alphabetFrequency(fastq, as.prob = T,baseOnly=T)
matplot(freq,type='l')
legend(legend = colnames(freq),x=700,y=0.7,lty=1:5,col=1:5)

Note answer about consensusMatrix. That code is the best answer.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Sean Davis25k
0
gravatar for chen
9 months ago by
chen1.7k
OpenGene
chen1.7k wrote:

This is per-cycle base content. R is not good at, and is not designed for such task.

Why complicate life? Just use fastp to do the easiest and fastest way for such tasks, including QC and filtering. It will output the figures like what you want.

ADD COMMENTlink written 9 months ago by chen1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1430 users visited in the last hour