contents of the fastq file
2
0
Entering edit mode
6.4 years ago
Dayna ▴ 50

Hello

I am very very new. Need to understand many confusion in my analysis. 2 related points:

  1. I have a fastq file has 151 bases and its pair has 151 bases. Should I assume that the 151 bases includes the adapters. So the actual read length is less than 151? As in this link: What is the difference between a Read and a Fragment in RNA-seq? because the fastqc file contains the read+adapters? Correct?
  2. The reads overlap nicely and the reads have too long overlapping? The reads are overlapping with 100 bases. I feel this is not good based on my understanding. Am I correct?

Thanks

fastq RNA-Seq • 1.8k views
ADD COMMENT
1
Entering edit mode
6.4 years ago
  1. You should run fastqc and see if it indicates that there's adapter contamination. If the fragment sizes were >151 bases (this is quite likely) then you'll have no adapters on the 3' end.
  2. Whether the reads overlap or not is irrelevant for most cases. If you want to do assembly then often people like to merge overlapping reads (this helps with error correction). If you're doing differential expression then you just map without any merging because the aligners can handle that without problems. If your original fragments were 100 bases long and your reads are 151 bases, then you're going to want to remove the adapters on each end (or use local alignment and just be done with it).
ADD COMMENT
0
Entering edit mode

Thanks so much. But I feel ok with your first point. Second point got it and I am confused from "if your fragments were 100 bases", I mean the reads overlap with 100 bases, but don't get what you mean.

ADD REPLY
1
Entering edit mode

In short, do you end up with this:

==========> read1
        <========== read2

or this:

==========> read1
<========== read2

In the second case, if the original fragment of DNA loaded onto the sequencer is longer than the read length then you get:

   ==========>### read1
###<==========    read2

where # is adapter sequence. You can have 100 base overlap in either case, but in one you have adapter contamination due to short fragment lengths and in the other you don't.

ADD REPLY
0
Entering edit mode

Thanks Devon so much. so if I have insert size ~200 and reads 2 x151 then I am in the second case, right?

ADD REPLY
0
Entering edit mode

If your inserts (i.e., the fragments before ligating adapters) are ~200 then you won't have much if any adapter contamination.

ADD REPLY
0
Entering edit mode

Confused, if insert size not fragment size is 200, then I am the second case? If the insert size is 180 then we are in the second case 2.

ADD REPLY
0
Entering edit mode

If your insert is >= your read length then you are in the first case. If the insert is smaller than the read length then you are in the second case.

In either case, you can always just run things through an adapter trimmer, it's not going to hurt anything.

ADD REPLY
1
Entering edit mode
6.4 years ago
BioinfGuru ★ 1.7k

1) No you cannot assume this. Adapters may or may not be present on each individual read. Normally you would use another program to check the quality of fastq file, and the output tells you which reads include adapters if they are there. I recommend a program called fastqc.

2) I am assuming when you say the "reads overlap" that you mean the read pairs. Please define A) overlapping, B) how you identified how much the reads overlap and C) why overlapping is important to you

ADD COMMENT
0
Entering edit mode

I mean the insert size is small the reads overlap, have a common bases. I am not speaking in this post about a specific problem, I am still in a b c and trying to make sure I understand right. So when the overlap is long the insert size is small, we have less genomic spaces from the read. I want to know the norms people think of about small insert sizes and long overlap. thanks so much

ADD REPLY
1
Entering edit mode

Dont worry about it. Really. Just get on with fastqc, trimming and assembly. Those steps will identify if you have any problems.

ADD REPLY

Login before adding your answer.

Traffic: 2982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6