Comparing overrepresented sequences from fastqc.txt file
1
0
Entering edit mode
23 months ago
margo ▴ 40

I am looking to compare the overrepresented sequences for a series of fastqc reports. All of my 16 fastqc reports have failed for overrepresented sequences and I am looking for a way to extract this information from the .txt file between >>Overrepresented sequences and >>END_MODULE and visualise and compare the data to see if it is contaminated. Is there any way to do this using python/R?

I would like to know the top overrepresented sequences between all files and see if there is any link and if so to blast them.

python R fastqc • 660 views
ADD COMMENT
0
Entering edit mode

Usually fastqc produces also an .html file where the section "overrepresented sequences" reports the top overrepresented sequences with their frequency of occurrence in your file. So you can simply copy them and paste into BLAST.

ADD REPLY
0
Entering edit mode

unless you post an example file, it is not clear what you want to extract. Please post example input and expected output or may be you could use multiqc to collate multiple fastqc reports and extract the information from multiqc output.

ADD REPLY
0
Entering edit mode

I have sequences from two different datasets. I am looking to compare the overrepresented sequences in both to see if there has been any cross contamination.

ADD REPLY
0
Entering edit mode
23 months ago
Trivas ★ 1.7k

I've used the fastqcr package in R to do this. You can use this package to pull out specific information from the fastqc report:

qc <- qc_read(file_path, modules = "Overrepresented sequences", verbose = FALSE)

Just a fair warning, this is a very noisy package and I had to look up a way to silent the read_delim messages that appear. If you're interested, I used trace(qc_read, edit=TRUE) then added show_col_types = FALSE to the read_tsv function.

ADD COMMENT

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6