Odd Fastq sequences; 50% Aligned more than once
2
0
Entering edit mode
3.7 years ago
pjferrandi • 0

Hello all. I'm REALLY trying to get DEG analysis done for my PI with essentially no experience and no guidance. I'm using Galaxy and attempting to assemble the fastq files received via sequencing and am getting very strange results. FastQC shows very odd GC%: gcpercentage sequencepercnt

The raw read file itself looks odd to me please see:
sequences

The major thing I notice is the same small sequence following by a + and then the unique sequence directly after. I do not see this in other sets of files that seem to be working very well. I've searched google with every combination of search terms that I can think of to understand what this is and how to solve it. I will be GREATLY appreciative to anyone who can help me, as I have a lot of pressure to get this done without any help. Thank you very much.

RNA-Seq assembly alignment sequence • 1.1k views
ADD COMMENT
0
Entering edit mode

if you're struggling how to add images, see this biostar post: How to add images to a Biostars post

ADD REPLY
0
Entering edit mode

Indeed, these data were small RNA seq specific data. My PI is in the process of getting the correct sequence files, which will hopefully pan out nicely for us. Thanks to everyone for your help!

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY
2
Entering edit mode
3.7 years ago
h.mon 35k

same small sequence following by a + and then the unique sequence directly after.

There is nothing wrong with the sequences. The line starting at "@" is a header, it ends with the Illumina barcodes (or indexes). Then, the first sequence is the actual read, the "+" indicates the next line are the base qualities, the next line are the base qualities indeed.

This is probably miRNA, not RNAseq. For example, googling one of the sequences from the picture, it returned the paper Effect of microRNA-1 on hepatocellular carcinoma tumor endothelial cells. This also explains the strange GC content plots: probably, you have two different miRNAs very highly expressed, and they contain different %GC.

edit:

I'm using Galaxy and attempting to assemble the fastq files received via sequencing

These reads are too short, you want to map them t a reference genome, not assemble them.

ADD COMMENT
0
Entering edit mode

My mistake! I am attempting to map to a reference via HISAT2 and/or STAR...for now I've just attempted HISAT2 to the reference mouse genome. Following this, I used htseq-count to get the read counts for each identified gene/transcript. This worked perfectly fine on the dataset I pulled from SRA (not our dataset, which was given to me by my PI).

I think you're correct that his is miRNA reads and not the RNAseq. It makes a lot of sense, because I followed through with the downstream steps (mapping, counts, annotating) and when I searched most of the Ensemble IDs, they were miRNA. There were some protein coding genes found, though. I'll ask him to be sure he sent me the correct files. Thanks for your help.

ADD REPLY
1
Entering edit mode
3.7 years ago

That fastq file looks fine if you only wanted to reads 20 bases. Those QC images look normal if what you have is a whole lot of a few sequences over and over again.

I'm pretty sure it's not RNA sequence. I'd say its index sequence, except that it doesn't match the index indicated in the name line of the reads.

ADD COMMENT
0
Entering edit mode

Lol, I really wish I understood what you mean. Should I contact the sequencing company for clarification on what these files actually are? The other set of files I'm comparing these to, I downloaded from SRA and they work perfectly when I run them through my Galaxy workflow...I'm not exactly sure why our files are so odd.

ADD REPLY
1
Entering edit mode

You should contact you PI and ask him the experiment details, the sequencing seems fine.

ADD REPLY

Login before adding your answer.

Traffic: 2968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6