Question

Visualize nucleotides for every position in R

4

Entering edit mode

6.4 years ago

marija ▴ 70

Hi, please, how can I visualize nucelotides for every position. I have a percentage of bases but now I don't know how to create a plot.

fastq <- readDNAStringSet("https://d28rh4a8wq0iu5.cloudfront.net/ads1/data/ERR037900_1.first1000.fastq","fastq")
fastq
freq <- alphabetFrequency(fastq, as.prob = T,baseOnly=T)

I would like something like this (sorry for quality):

R • 4.3k views

ADD COMMENT • link updated 2.9 years ago by xiaoleiusc ▴ 140 • written 6.4 years ago by marija ▴ 70

1

Entering edit mode

or use fastqc https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ?

ADD REPLY • link 6.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

6.4 years ago

chen ★ 2.5k

This is per-cycle base content. R is not good at, and is not designed for such task.

Why complicate life? Just use fastp to do the easiest and fastest way for such tasks, including QC and filtering. It will output the figures like what you want.

ADD COMMENT • link 6.4 years ago by chen ★ 2.5k

score 7 · Accepted Answer · 2017-12-01

7

Entering edit mode

6.4 years ago

cpad0112 21k

You need to change your code. It is not alphabet frequency. It is consensus matrix. Example code and image below:

library(Biostrings)
fastq <- readDNAStringSet("sample.fastq","fastq")
## At each position, base frequency
afmc=consensusMatrix(fastq, baseOnly=T,as.prob = T)
tafmc=t(afmc)
matplot(tafmc[,-5], type="l", lwd=2, xlab="Read Length", ylab= "Base frequency at each position")
legend(legend = colnames(tafmc)[-5],"topright",col=1:4, lty=1:4, lwd=2)

5th column is others (other than A,T, G and C).

ADD COMMENT • link 6.4 years ago by cpad0112 21k

0

Entering edit mode

Awesome, thank you very much.

ADD REPLY • link 6.4 years ago by marija ▴ 70

2

Entering edit mode

much simpler is using SeqTools in R:

library(seqTools)
# Reads fastq file
fq=fastqq("sample.fastq")
# Plots nucleotide frequency
plotNucFreq(fq,1)

ADD REPLY • link 6.4 years ago by cpad0112 21k

0

Entering edit mode

Thanks, does this apply to fasta file containing sequence of one gene? How to set sliding window size in Seqtools?

ADD REPLY • link 2.9 years ago by xiaoleiusc ▴ 140

score 4 · Accepted Answer · 2017-11-30

library(Biostrings)
fastq <- readDNAStringSet("https://d28rh4a8wq0iu5.cloudfront.net/ads1/data/ERR037900_1.first1000.fastq","fastq")
freq <- alphabetFrequency(fastq, as.prob = T,baseOnly=T)
matplot(freq,type='l')
legend(legend = colnames(freq),x=700,y=0.7,lty=1:5,col=1:5)

Note answer about consensusMatrix. That code is the best answer.