2
0
Entering edit mode
7.0 years ago

Hello! I am new to RNA seq. I quality trimmed my fastq sequences via fastX toolkit using a phred score of 30. I would like to figure out the read length after the quality trim (phred score < 30 was removed). Any help would be much appreciated.

-Nikelle

RNA-Seq phred-quality read-length • 2.5k views
0
Entering edit mode

This kind of trimming is very severe and likely unnecessary. I assume you are aligning to a reference genome. You are probably throwing away lot of good data if you did lose a lot if bases after trimming.

See this thread for a paper referenced in there about trimming using quality: Which Phred value to use in trimming

0
Entering edit mode

You might want to consult www.rnaseq.wiki for a really nice description of the steps involved in processing RNAseq. IIRC, they cover read trimming in the appropriate section

0
Entering edit mode
7.0 years ago

Run FastQC before and after trimming.

0
Entering edit mode

Thank you very much! One other question. Before trim, I had sequence length = 100. After trimming, I had a sequence length of 3-100. Can you make any sense of this? I think I am having a hard time figuring out what length it is measuring.

0
Entering edit mode

That means some reads were trimmed to a length of 3 eliminating 97 bases and you have a range of read lengths remaining that goes from 3 to 100. See my comment for your original question.

0
Entering edit mode

Thanks!

0
Entering edit mode
7.0 years ago

sed -n '1~4p' filename.fastq | perl -ne 'chomp;print length($_) . "\n"' | sort -n | uniq -c >length.dist  ADD COMMENT 0 Entering edit mode Hi, Thanks Chris. If I were to input this, what would this be generating? ADD REPLY 0 Entering edit mode Let's break it down: sed -n '1~4p' filename.fastq  Gives you every 4th line of the file (the sequence line) perl -ne 'chomp;print length($_) . "\n"'


outputs the length of that line

sort -n | uniq -c


condenses it into a table of counts like this:

  3  98
123  99
22  100

0
Entering edit mode

Thanks very much, that was helpful. What are the two different columns in the table?

0
Entering edit mode

Count and read length (see man uniq)

0
Entering edit mode

Chris, the fourth line of FASTQ is the quality score, not sequence (but it should be trimmed to the same length as the sequence string, so results should be the same).

0
Entering edit mode
The command gives every 4th line, starting with the 1st. Come to think of it, that should be 2~4p, right, because of the header?