Deconseq to remove human sequences
1
0
Entering edit mode
8.4 years ago

Hello,

I am using Deconseq to remove human sequences from fastq files generated by Myseq Illumina. I created the database with the human sequences using:

bwa64 index -p hs_ref_GRCh38_p2 -a bwtsw hs_ref_GRCh38_p2_split_PS.fa.fasta > bwa.log 2 >&1

but now I don't know how to remove these sequences from my actual paired files, let's call them myfile_1.fq and myfile_2.fq

could you give some hints?

Thank you.

deconseq cleaning • 3.3k views
ADD COMMENT
0
Entering edit mode
8.4 years ago
satshil.r ▴ 50
perl deconseq.pl -f myfile_1 -dbs hs_ref_GRCh38_p2 -i 90 -c 90 -out_dir <directory>

The -I 90 refers to an identity threshold:

Alignment identity threshold in percentage. The identity is calculated for the part of the query sequence that is aligned to a reference sequence. For example, a query sequence of 100 bp that aligns to a reference sequence over the first 50 bp with 40 matching positions has an identity value of 80%.

The -c 90 refers to the coverage threshold:

Alignment coverage threshold in percent. The coverage is calculated for the part of the query sequence that is aligned to a reference sequence. For example, a query sequence of 100 bp that aligns to a reference sequence over the first 50 bp with 40 matching positions has an coverage value of 50%.

You have to make sure you define your deconseq databases in the configuration file.

hs_ref_GRCh38_p2 => {name => 'hs_ref_GRCh38_p2',
                        db => 'hs_ref_GRCh38_p2'},

and make sure you define the database location:

use constant DB_DIR => "<DIR_WITH_BWA_DB_OUTPUT>";

Of course you have to adjust the settings, specifically the c and i thresholds to what you seem fit.

ADD COMMENT
0
Entering edit mode

Thank you very much, but it still a bit beyond me. So first of all, if I have two paired files, why there is only one in the command? Secondly, what configuration file shall I modify? Thirdly, the database location should go in the same config file? Should these modification be done verbatim? Cheers

ADD REPLY
0
Entering edit mode

I created the database with the human sequences using:

bwa64 index -p hs_ref_GRCh38_p2 -a bwtsw hs_ref_GRCh38_p2_split_PS.fa.fasta > bwa.log 2 >&1

This as created a series of files that I placed in a subfolder named refChr. The list of files is:

hs_ref_GRCh38_p2.amb hs_ref_GRCh38_p2.pac hs_ref_GRCh38_p2.sa
hs_ref_GRCh38_p2.ann hs_ref_GRCh38_p2.rbwt hs_ref_GRCh38_p2_split.fa
hs_ref_GRCh38_p2.bwt hs_ref_GRCh38_p2.rpac hs_ref_GRCh38_p2_split.fa.log
hs_ref_GRCh38_p2.fa hs_ref_GRCh38_p2.rsa hs_ref_GRCh38_p2_split_PS.fa.fasta

I then ran the following command to use Deconseq:

~$ perl /usr/bin/deconseq.pl -f fu_1.fq -dbs ./refChr/hs_ref_GRCh38_p2 -i 90 -c 90 -out_dir DECONSEQ
But I got the following error:
ERROR: database "./refChr/hs_ref_GRCh38_p2" does not exist in config file.

Try 'deconseq -h' for more information.
Exit program.

I tried with '/refChr/...' and 'refChr/...' and also with '...hs_ref_GRCh38_p2.fa' and '...hs_ref_GRCh38_p2.sa' but same error.

What would be the correct use of Deconseq with the human library to remove the human contaminants?

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 3212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6