Question: CUTADAPT in windows
0
gravatar for F
3.5 years ago by
F3.1k
Iran
F3.1k wrote:

Hello all

I have two fastq files, first set are the footprints, while the second sets are data coming from a common RNA-seq experiment performed in parallel, I want to adapter sequences, using the software CUTADAPT

The adapter sequences are:

GTTCAGAGTTCTACAGTCCGACGATC # 5 prime adapter sequence

ATCTCGTATGCCGTCTTCTGCTTG # 3 prime adapter sequence

may you please tell whether and how i can do this project with windows seven 64 bit?? because I am not familiar enough with linux 

help me please

sequencing • 2.8k views
ADD COMMENTlink modified 3.5 years ago by Brian Bushnell16k • written 3.5 years ago by F3.1k
2

Read this documentation properly. https://cutadapt.readthedocs.org/en/stable/

ADD REPLYlink written 3.5 years ago by Deepak Tanwar3.9k
6
gravatar for Brian Bushnell
3.5 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

You can do that in Windows with BBDuk, if you install Java.  Assuming you extract to C:\bbmap\, and you have paired reads named read1.fq and read2.fq:

java -ea -Xmx1g -cp C:\bbmap\current\ jgi.BBDukF in=read#.fq out=trimmed#.fq ktrim=r k=23 mink=12 hdist=1 tbo tpe literal=GTTCAGAGTTCTACAGTCCGACGATC,ATCTCGTATGCCGTCTTCTGCTTG

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Brian Bushnell16k

thank you

Brian, I downloaded BBMap_34.92.tar.gz and  java 8 and installed java,,

where i should type " java -ea -Xmx1g -cp C:\bbmap\current\ jgi.BBDukF in=read#.fq out=trimmed#.fq ktrim=r k=23 mink=12 hdist=1 tbo tpe literal=GTTCAGAGTTCTACAGTCCGACGATC,ATCTCGTATGCCGTCTTCTGCTTG"??? in command prompt??

you mean first i should extract  BBMap_34.92.tar.gz then what is the later step?

please help me more if possible

 : when i typed your command in cmd

told cant find or load main class of jgi BBDukf

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
3

Hi Sarah,

First you need to extract BBMap.  You can do that with 7-zip (a great program); just right-click and choose 7-zip then "extract here".  You need to do this once for the .gz file and then once for the tar file.That will create a directory called bbmap, which you can put in C:.

Then (for simplicity) move your sequence files to a new directory called "C:\sequence\".

Then open a command prompt (typically by typing "cmd" at the bottom of the start menu).  Type "cd C:\sequence".

Then you type the java command, which should work.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thanks a lot Brian

ADD REPLYlink written 3.5 years ago by F3.1k

sorry Brian

may you tell me please, how I select the minimal length of the remaining fragments and to trim also by the quality of the reads???

ADD REPLYlink written 3.5 years ago by F3.1k
1

If you want to quality trim to (for example) Q15, add these flags:

qtrim=r trimq=15

To throw away reads shorter than 20bp after trimming:

minlength=20

These and all other flags are described in C:\bbmap\bbduk.sh, which can be opened with Wordpad (not notepad).

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Brian Bushnell16k
3

thank you Brian for your patience and kindness

...I am just a beginner and I have to work on NGS because of my thesis

ADD REPLYlink written 3.5 years ago by F3.1k
1

Bachelor or Masters?

ADD REPLYlink written 3.5 years ago by Deepak Tanwar3.9k
1

I am shy to tell but I am a phd student but i have not already work in bioinformatics field at all and recently i had to ...

ADD REPLYlink written 3.5 years ago by F3.1k
2

Nothing to shy about :-)

ADD REPLYlink written 3.5 years ago by Deepak Tanwar3.9k

Brian,

In cmd, I entered:

cd C:\sequencec

( which sequence file was contain of  gruseq.fa and raw.qual)

then I typed: java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\bbmap\current\ jgi.BBDukF in=raw.qual out=clean.fq ref=gruseq.fa ktrim=r k=28 mink=12 hdist=1

but it says that: BBDUK VERSION 34.92

maskmiddle was disabled because useshortkmers=true

my Java is: jre-8u45-windows-x64_2.exe

sorry Brian, I tried with another option you suggested but the same error

may you please tell me the reason if possible..

thank you

 

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
2

That's not an error, just a notification.  When doing adapter-trimming, "maskmiddle" is disabled.  It's intended for filtering operations, so that's fine.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thank you

I mean why I can not get some result like which i copied from your thread

Total output: 966786 reads 113303242 bases
Perfectly Correct (% of output): 901541 reads (93.251%) 103689866 bases (91.515%)
Incorrect (% of output): 65245 reads (6.749%) 9613376 bases (8.485%)
Adapters Remaining (% of adapters): 65243 reads (13.973%) 1229480 bases (1.085%)
Non-Adapter Removed (% of valid): 2 reads (0.000%) 27 bases (0.000%) 

or anything else to present to my supervisor?

for example he asked me: "A quick check you might do would be to control the initial number of entries (reads) included in the "raw data" fastq file and compare it with the number of entries  (reads) which are in the final fastq file. The number of entries in the "final" file should be less.

but i don't know where i should search for read counts

in cmd just there is something about java and run time spent

excuse me...

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

Those specific results are from analyzing the output of trimming operations on synthetic data, in which the answer is known.  For real data, you don't know the correct answer, so the output will look more like this:

 

C:\temp\ecoli>java -ea -Xmx1g jgi.BBDukF in=reads_B2_100x100bp_0S_0I_0D_0U_0N.fq ktrim=r k=23 hdist=1 ref=D:\temp\BBTools_public\bbmap\resources\truseq.fa.gz
Executing jgi.BBDukF [in=reads_B2_100x100bp_0S_0I_0D_0U_0N.fq, ktrim=r, k=23, hdist=1, ref=D:\temp\BBTools_public\bbmap\resources\truseq.fa.gz]

BBDuk version 34.93
No output stream specified.  To write to stdout, please specify 'out=stdout.fq' or similar.
Initial:
Memory: free=494m, used=19m

Added 42113 kmers; time:        0.155 seconds.
Memory: free=469m, used=44m

Input is being processed as unpaired
Processing time:                0.067 seconds.

Input:                          100 reads               10000 bases.
KTrimmed:                       0 reads (0.00%)         0 bases (0.00%)
Result:                         100 reads (100.00%)     10000 bases (100.00%)

Time:                           0.243 seconds.
Reads Processed:         100    0.41k reads/sec
Bases Processed:       10000    0.04m bases/sec

That tells you the number of reads and bases in (input) and the number of reads and bases out (result).

But, your command is wrong:

java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\bbmap\current\ jgi.BBDukF in=raw.qual out=clean.fq ref=gruseq.fa ktrim=r k=28 mink=12 hdist=1

should be something like

java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\bbmap\current\ jgi.BBDukF in=raw.fastq out=clean.fq ktrim=r k=28 mink=12 hdist=1 literal=GTTCAGAGTTCTACAGTCCGACGATC,ATCTCGTATGCCGTCTTCTGCTTG

The input has to be a fastq or fasta file, not a qual file.  And "ref=gruseq.fa" was just used as an example for the synthetic test; in real life you should specify the actual adapter sequences (in your case, with "literal=").

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Brian Bushnell16k

sorry Brian for disturbance,

i should read your reply many times exactly perhaps i can do whichever my supervisor asked me

thank you

ADD REPLYlink written 3.5 years ago by F3.1k

good morning Brian,

 my supervisor just has given me the accession number GSE35641 and asked me trim and  keep reads having more than 15 bp..

i don't have adapter sequence, how i know what is adapter sequence in my accession number?

thank you

ADD REPLYlink written 3.5 years ago by F3.1k
2

If your reads are paired, you can find the adapter sequence with BBMerge:

java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\bbmap\current\ in=read#.fq outa=adapter.fa

Otherwise, you can try the different adapter sequences in bbmap\resources; it's most likely to be truseq.fa.gz but it could be nextera.fa.gz.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

Thanks a lot

I will try your tips now

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k

Brian,

I need your help as already

from where i can download BBMerge? does BBMerge embed in BBMap and no need to be downloaded?

using this command in=reads.fq out1=clean1.fq out2=clean2.fq minlen=25 qtrim=r trimq=10

i have two clean.fq files now, if i want to align the trimmed reads against a reference genome, which of those files (clean.fq 1 and 2) i should use?

 

 

 

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

BBMerge, BBDuk, and BBMap are all together in the BBMap download.

When aligning paired reads, you should use both files at the same time.  Please note that if you run BBDuk with one input and 2 outputs, it will assume the input is interleaved split the input into two files.  You can, alternately, just specify one output file to keep it interleaved.

To map these to a reference, you can do so like this with BBMap:

java -Xmx1g -ea -cp C:\Users\yang\Documents\Downloads\bbmap\current\ align2.BBMap in1=clean1.fq in2=clean2.fq out=mapped.sam ref=reference.fasta

That will map the files to the specified reference.  However, the amount of memory required depends on the size of the reference file.  For the human genome, for example, you would need to change "-Xmx1g" to "-Xmx22g".  BBMap generally requires approximately 7 bytes per base pair of reference sequence.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Brian Bushnell16k

thanks Brian,

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k

Brian,

my supervisor told

 

For the rRNA filtering part (i.e. mapping the fastq reads on the rRNA sequences and keeping only the reads which do not map): i should type
 
 bowtie2 -x [name of the bowtie2-build indicized file containing the rRNA sequence] --un [name of the fastq file which will contain the UNMAPPED reads] -U [name of the fastq file containing the reads] -S [name of the .sam file that will contain the MAPPED and UNMAPPED reads]
I could not understand his mean about --un option because i don't know which i should type instead of [name of the fastq file which will contain the UNMAPPED reads]
may you please help me although this irrelevant to sequence trimming.
excuse me 

 

 

ADD REPLYlink written 3.5 years ago by F3.1k
1

Did the command I suggested not work?

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

sorry Brian,

i have just one read file then i thought it is not paired read, i trimmed my reads based on your previous tips, in the other hand my supervisor newly asked me to align my clean.fq file against reference genome, but i dont know which i should type in [name of the fastq file which will contain the UNMAPPED reads]. in bowtie2 manual i read that 

--un <path> write unpaired reads that fail to align to file at <path>. These reads correspond to the SAM records with the FLAGS 0x4 bit set and neither the 0x40 nor 0x80bits set. If --un-gz is specified, output will be gzip compressed. If --un-bz2 or--un-lz4 is specified, output will be bzip2 or lz4 compressed.

but i didnt get which i should type after --un

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k

Brian, 

i have a sam file containing unmapped reads, i am going to convert my sam file to fasta or fastq to be a input in bowtie2 to align on genome.

may you please tell how i can convert sam to fasta or fastq by reformat in BBDuk??

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

Pretty straightforward:

reformat.sh in=unmapped.sam out=unmapped.fastq

In Windows that would be:

java -ea -Xmx1g -cp path\to\BBMap\current jgi.ReformatReads in=unmapped.sam out=unmapped.fastq
ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Brian Bushnell16k
2

That is it Brian,

you clever boy deserve to be appreciated, your code worked well.

thank you

ADD REPLYlink written 3.5 years ago by F3.1k

Brian,

can i use BBDuk to see genome coverage?? i tried bedtools to produce bed graph and bed histogram but dont work on windows, also i tried samtools but i just found some weird command...i have a bam file that i want to see genome coverage of.. 

ADD REPLYlink written 3.5 years ago by F3.1k
1

You can do that with Pileup, not with BBDuk.

pileup.sh in=mapped.bam covstats=stats.txt hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1000

"basecov" will give you the exact coverage at every base location, so that file will be huge if you have a large genome; so you may want to skip that one.  "bincov" gives the coverage binned by every 1000bp (by default) so is a lot smaller and easier to plot in, say, Excel.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thank you very much Brian,

i will try and will ask question as already...

thanks again

ADD REPLYlink written 3.5 years ago by F3.1k

Brian,

i typed (im in windows)

java -ea -Xmx1g -cp path\to\BBMap\current jgi.pileup in=eg1.bam covstats=stats.txt hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1000

but it says that fail to find main class pileup

i typed the code properly?? even jgi.pileup ??

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

Ah, sorry, in Windows it would be "jgi.CoveragePileup".

And -Xmx1g may or may not be enough depending on the reference.  That's fine for a small genome but for a human-sized you will need at least -Xmx8g.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thank you Brian,

i typed: 

java -ea -Xmx1g -cp C:\Users\yang\Documents\Downloads\BBMap\current jgi.CoveragePileup in=sim_reads_aligned.bam covstats=stats.txt hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1000

i am working with ecoli

it says that:  set USE_COURAGE_ARRAYS to true

set USE_BITSETS to false

exception in thread "main" java.lang .ASSERTAIONERROR: -128 = ?

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

Can you please copy and paste the full error, include all the information above it?

Oh - I should mention, that command will not work unless samtools is installed and in your path.  If samtools is not installed, Pileup can only read the file if you first convert it to sam format.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Brian Bushnell16k

Sorry Brian for delay,

i installed samtools already...

my bam file is in samtools folder,

Microsoft Windows [Version 6.1.7600]

Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\yang>cd C:\Users\yang\Documents\Downloads\samtools

C:\Users\yang\Documents\Downloads\samtools>java -ea -Xmx1g -cp C:\Users\yang\Doc

uments\Downloads\BBMap\current jgi.CoveragePileup in=eg1.bam covstats=stats.txt

hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1000

Executing jgi.CoveragePileup [in=eg1.bam, covstats=stats.txt, hist=histogram.txt

, basecov=basecov.txt, bincov=bincov.txt, binsize=1000]

 

Set USE_COVERAGE_ARRAYS to true

Set USE_BITSETS to false

Exception in thread "main" java.lang.AssertionError: Missing field 1: ▼♦     ÿ♠

BC☻ ? }?»

        at stream.SamLine.<init>(SamLine.java:472)

        at jgi.CoveragePileup.processSamLine(CoveragePileup.java:675)

        at jgi.CoveragePileup.process(CoveragePileup.java:351)

        at jgi.CoveragePileup.main(CoveragePileup.java:49)

C:\Users\yang\Documents\Downloads\samtools>

and for my another bam file:

 

 

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

 

C:\Users\yang>cd C:\Users\yang\Documents\Downloads\samtools

 

C:\Users\yang\Documents\Downloads\samtools>java -ea -Xmx1g -cp C:\Users\yang\Doc
uments\Downloads\BBMap\current jgi.CoveragePileup in=sim_reads_aligned.bam covst
ats=stats.txt hist=histogram.txt basecov=basecov.txt bincov=bincov.txt binsize=1
000
Executing jgi.CoveragePileup [in=sim_reads_aligned.bam, covstats=stats.txt, hist
=histogram.txt, basecov=basecov.txt, bincov=bincov.txt, binsize=1000]

 

 

Set USE_COVERAGE_ARRAYS to true
Set USE_BITSETS to false
Exception in thread "main" java.lang.AssertionError: -128 = ?
array=▼♦     ÿ♠ BC☻ ? srôe?b``p?pá♀ó?2?3à♀ö·*?+?/*IMá♫ä♀ö?J?¬144031024?)JM«ñs?70
°025Ö3¬áôñ?2±4?°42àpçôt±J?//ÉL5â♀?ƒ3??←é☺?çb♦?&Äx?dX◄??  B?Bú?   ♦     ÿ♠ BC☻ g%
?}[?-?uÖ?ó?☻$▲Ü?Fô♥²èN+n♀??Kß?xè?è♫æw@?♫??Ü?9g&?C$?↔#¶ƒ??▬4►@ ?☻?♦?P$Ä♂?Ä♥◄y@AÜ?
¶       °►, start=247, stop=249
        at align2.Tools.parseInt(Tools.java:1508)
        at stream.SamLine.<init>(SamLine.java:473)
        at jgi.CoveragePileup.processSamLine(CoveragePileup.java:675)
        at jgi.CoveragePileup.process(CoveragePileup.java:351)
        at jgi.CoveragePileup.main(CoveragePileup.java:49)

 

C:\Users\yang\Documents\Downloads\samtools>

 

 

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

Well...  it looks like, for whatever reason, Pileup can't find Samtools.  I suggest you first convert the bam to sam and then run Pileup on the sam file.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thank you,

then i will try with sam..

ADD REPLYlink written 3.5 years ago by F3.1k

yes Brian,

i tried with sam file and worked well.

thank you

how can i show the result in a graph??

ADD REPLYlink written 3.5 years ago by F3.1k
1

Paste bincov.txt into Excel, and make a scatter plot...

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

Thank you very much

sorry Brian,

can i use  BBDuk to convert sra, wig or txt to fasta or fastq?? even though i could not find them in reformat's accepted formats in your thread in SEQanswer..

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by F3.1k
1

Nope, the only thing I know of that will convert sra files is the sra toolkit.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thank you

ADD REPLYlink written 3.5 years ago by F3.1k
2

Sarah,

Look at these: Cutadapt On Windows

Adapter trimming tool for Windows?

ADD REPLYlink written 3.5 years ago by Deepak Tanwar3.9k
1

thanks Deepak 

ADD REPLYlink written 3.5 years ago by F3.1k

sorry Brian,

can i use reformat to convert txt.gz or  wig.gz files to fasta or fastq??

ADD REPLYlink written 3.5 years ago by F3.1k
1

Reformat will only convert reads; the only formats it will accept are fasta, fastq, fasta+qual, scarf, sam, and bam (if samtools is in the path).  It will accept gzipped or not gzipped.  But there's nothing that can convert a wig file to reads, because the data is no longer there.  As for .txt, that's a generic extension that could mean anything, so it depends on the contents, but probably not.

ADD REPLYlink written 3.5 years ago by Brian Bushnell16k

thank you

ADD REPLYlink written 3.5 years ago by F3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1748 users visited in the last hour