Question: Determining read count
0
gravatar for nikelle.petrillo
3.5 years ago by
Providence College, Providence, RI
nikelle.petrillo100 wrote:

Hello everyone,

I have run my reads through trimmomatic. I would like to know the number of reads in my raw reads and then the number of reads that were left after quality trimming/filtering in trimmomatic. I have generated "trimlogs" but these seem to be of no help as I do not quite understand them. Does anybody know how (or the code/command i can use) to go about counting the number of raw reads and then counting the number of reads left after filtering?

Thanks! Nikelle

ADD COMMENTlink written 3.5 years ago by nikelle.petrillo100
3

If you have FastQC installed then run it on pre-/post- set of files.
There would be grep (and other ways of doing this) but we would need to know what machine identifier your read headers have to make it fool proof.
Or this: How to count fastq reads

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by genomax74k

Thanks genomax,

I have the fast X toolkit installed, could I use this to count reads?

ADD REPLYlink written 3.5 years ago by nikelle.petrillo100
3

Trimmomatic should be anything like that of the cutadpat where you are using a custom tool to remove the adapter sequences. When you received your raw reads you must have some Fastq.gz raw files with FastQC report files. You can check there for the number of reads for the raw sequences. Once you have the output of timmomatic you can run again the FASTQC , genomax2 is correct in saying that, however for pre/post fastq or fastq.gz you can always do this

cat file.fastq | echo $((`wc -l`/4))

if compressed

zcat file.fastq.gz | echo $((`wc -l`/4))
ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by ivivek_ngs4.8k

Thank you!

Is it correct if I run this command:

[npetrill@trogdor AMMA_transcripts]$ awk '{s++}END{print s/4}' R1V29_B1F9St_CGATGT_L001_R1_001.fastq
31118843

On a fastq file that looks like this?

@DF8F08P1:347:C5YT5ACXX:1:1101:1385:2060 1:N:0:CGATGT
CACCTCNTTTCAATCCTCCTGGACCAGGTTCGCCGGATACTCCTCTTGTTCCTGTGAAACCTCTTAATCCTGATGGGCCGCTCTTGCCTGAAATTCCAGA
+
CCCFFF#2AFHHHJJJJJJJJIGIJJJJIHIJJJJIDGHJJJJJJJJJJJIJHGHGFHGFFFFFE@DEECEDDDDA?BBDDDDDDDDDDD?CDDDCCDDD
@DF8F08P1:347:C5YT5ACXX:1:1101:1498:2065 1:N:0:CGATGT
AAAGAAAGTGGCCTTCAGTGTACGTCCCATGATCAGCTGGAACGGGTCGACGGGGATACCCCGGTCGTCTAATGACACGCCAACAATTGCACACGTTCCC
+
CCCFFFFFFFHHHJJJIJJJJJJJJJJJJJJIJGIIIHJJIJJJHIHEIDIJHHF>BECDABBD??B8@DBDDCC>ACBDDDDDDDDDDDDDDDBABDDD
@DF8F08P1:347:C5YT5ACXX:1:1101:1399:2123 1:N:0:CGATGT
TTGCACTTTGGCTAACCAACAAGCAGTATTGCCCAACGCCATCCCACAGAAAAACATTAAAAACACCCAACAATTATCAATCATGTCACCGTATGTCAAA
+
@@CDDDDFHHHDHGIJI@EFAFFGGGHIGIJJGEEGCFHHJDCAABHGE@FD3CFE@EE@;AAEED;BAC=;;>@>C:>;@CCCDDCCDC@DBACCDDDA
@DF8F08P1:347:C5YT5ACXX:1:1101:1609:2103 1:N:0:CGATGT
CTGCAATGTTCCATTCACAAGTATGGTTCATCCTTCGACAGCTCCCATTTTCGGGGAGGTACTCCTGGTAAGACGGGTTTTGTTTCAATAGGTGCAACCT
+
CCCFFFFFHHHHHJJJJJJJHCHIIJHIJJJIIIJJJJJIJIJIJJIJIJJIJJJJHIJ?AEHHFEFFBDCCCED>B,8<@BBDDCCCDCDDCCCCDCDD
@DF8F08P1:347:C5YT5ACXX:1:1101:1726:2231 1:N:0:CGATGT
TGCAAAACTACTTCCTCCTCCCATGCTGTAGGAGCCACCAGCGCCGTATCCAGCACCACCACCACCTCCAGCTCCAGCTCCAGCTCCACCTCCACCACCG
+

Will that command give me the correct amount of reads?

ADD REPLYlink modified 3.5 years ago by genomax74k • written 3.5 years ago by nikelle.petrillo100

Looks like it did give you the number of reads: 31118843
Confirm with the command @vchris_ngs gave in the post above yours.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by genomax74k

Great, that works. Thank you both!

ADD REPLYlink written 3.5 years ago by nikelle.petrillo100

do both of the command gives you same read output? 31118843

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by ivivek_ngs4.8k

They do, thanks so so much!

ADD REPLYlink written 3.5 years ago by nikelle.petrillo100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour