How To Get Full Information About Rna-Seq File?
1
0
Entering edit mode
10.0 years ago
Y Tb ▴ 230

Dear All,

I am new in bioinformatics area, and I have some RNA-seq file for human tissues( liver, lung, and spleen) that I need to analyze them. I want to know how I can get full information about these files like: which technique that use to make the sequences (I mean illumina, solex,......etc) , how can I know if they single end or paired end , what are the first steps should I do for these files to start the analysis, Also I want to know if it specific strand or not (Note that I don't have any information about these data).

rna-seq • 4.1k views
ADD COMMENT
0
Entering edit mode

which type of data do you have? fastq? bam?

ADD REPLY
0
Entering edit mode

I am sorry for send my comment here

ADD REPLY
0
Entering edit mode

the data format is fastq.

ADD REPLY
0
Entering edit mode

I would recommend you to visit a RNA-Seq data analysis workshop. There are several workshops available and it you will learn how the data formats look like and how you can analyze your data. You have to pay for these, but in the end you safe a lot of money and you can directly start your analysis.

ADD REPLY
0
Entering edit mode

Some better tags might be useful above. "How", "to", "get" etc. are not useful.

ADD REPLY
0
Entering edit mode
10.0 years ago
Michael 54k

There is a lot of information you cannot extract from the raw data files. If you want all information relevant for analyzing the data, you need either accompanying documentation or to ask the source directly, and they should give you documentation if you paid for the sequencing

From looking at the data you can see:

  • Base technology: Illumina vs. Solid vs. 454
  • Paired end or not (if paired end there are normally two files with R1, R2 for Illumina)

Important facts you can't know for sure:

  • The type of original sequence: RNA or DNA
  • The protocols used, e.g. TruSeq1,2,3, polyA-enrichment, total RNA, etc.
  • was the protocol strand-specifc (you might try to guess from analyzing the coverage)
  • adapter sequences used (mostly important for adapter trimming)
ADD COMMENT
0
Entering edit mode

Dear Michael, Thanks for helping me. I just contacted the person who sent me the data and he told me technology that used Illumina , and he sent me the adapter sequence and the following information

RNAseq libraries were prepared with the NEXTflex™ Directional RNA-Seq Kit, dUTP method ,and each library was quantitated by qPCR and sequenced on one lane 101 cycles on a HiSeq2000 using a TruSeq SBS sequencing kit version 3 and analyzed with Casava1.8.2.

I know that there are a secret information about how they process the data to get RNA-seq, but my question now how to get another information about the fastq file it self.

ADD REPLY
0
Entering edit mode

In this concise sentence there is all information that is practically relevant and you need to know, so the person giving you that information did a good job. You can now use the adapter sequences with e.g. Trimmomatic and you know that the coverage is strand specific and can be used to e.g. identify overlapping antisense transcripts. What else do you want to know about the fastq?

ADD REPLY
0
Entering edit mode

Thanks again Michael, how do you know the data is strand specific.

Do you have any idea about identify overlapping antisense transcripts. In addition, can I extract any other information from fastq file before starting my analysis.

ADD REPLY
0
Entering edit mode

One idea is to run fastqc to check for the quality of the data

ADD REPLY
0
Entering edit mode

Thanks Nicolas, I know that and I already used FastQC to check the quality of the sequences. But my question is if you look to the raw RNAseq file in fastqc (I mean the first line of each read and last line which is the quality score)can you get any other information.

ADD REPLY

Login before adding your answer.

Traffic: 2975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6