Question: (Closed) how to extract data from RNA data
0
gravatar for Learner
2.2 years ago by
Learner 200
Learner 200 wrote:

I have several data obtained by Illumina HiSeq 2000 and saved as bedgraph.gz I would like to know the best way to extract the info for example extracting the data for a specific gene

I would really appreciate if you could explain me or even giving me an example

I know one can use IGV or IGB to import the data but I have no idea how to find a gene and extract the data, or I dont know if there is a better way to do it?

My main effort is to be able to correlate it with a methylation data

Many thanks

rna-seq • 802 views
ADD COMMENTlink modified 2.2 years ago by Biostar ♦♦ 20 • written 2.2 years ago by Learner 200
1

It would be very helpful for those waking up on Monday morning (or those already awake in Oceania and East-Asia), if you let us know how you produced the bedgraph file (?)

People generally start with FASTQ or bcl files straight from the sequencer, not bedgraphs.

Thanks, Kevin

ADD REPLYlink written 2.2 years ago by Kevin Blighe55k

@Kevin Blighe Ok, please bear with me so that I can get to the point to explain the situation. RNA-seq reads were aligned to the human genome (hg19) using TopHat2-2.0.12. Transcript abundances at the gene level were estimated by cufflinks. Gene expression from primary samples showing variation greater than zero were corrected for potential batch effects using ComBat. I have a question, may I get your email so that I can contact you through email?

ADD REPLYlink written 2.2 years ago by Learner 200

Learner : Please keep the discussions about questions on Biostars. Otherwise, others miss out on the solutions.

You should extract the read data for the gene of your interest from using the alignment files and samtools view region option (search here for threads). Sounds like you have alignment data. Forget about bedgraph for this purpose.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax78k

Yes, as per genomax, it's better to keep discussions in the open, here. I will not respond to you if you send to my email.

ADD REPLYlink written 2.2 years ago by Kevin Blighe55k

@genomax Ok, so can you tell me what is not clear in my question so that I can respond to it? I don't have the FASTQ files. I only have those bedgraph.gz . is it not possible to do it?

ADD REPLYlink written 2.2 years ago by Learner 200

Pasting a sample of your bedgraph would help.

ADD REPLYlink written 2.2 years ago by Kevin Blighe55k

Example of file was posted in original question in this thread: what is the best way to extract specific region of a RNA seq

ADD REPLYlink written 2.2 years ago by genomax78k

Thanks for finding that genomax!

Learner, I agree with what Igor said in the other thread:

If you would like to correlate with methylation, you need extract the expression values. In other words, do a proper RNA-seq analysis. Essentially, that means calculating the raw counts for the genes and then normalizing them across your samples.

I suggest that you:

  1. Re-analyse your FASTQ files to get gene-level raw counts using FeatureCounts or Kallisto
  2. Normalise these using DESeq2 or EdgeR (and log transform)
  3. Correlate your normalised gene counts to your methylation data

I have doubts about the use of both Cufflinks and ComBat in your current / existing approach

ADD REPLYlink written 2.2 years ago by Kevin Blighe55k

From post above:

I don't have the FASTQ files.

Learner : Are you not able to get them? You won't be able to recreate them from that data you have.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax78k

@genomax Unfortunately I don't have them or I am unable to regenerate them. Do you think that it is possible to get those from a given GEO ? https://www.ncbi.nlm.nih.gov/geo/

ADD REPLYlink written 2.2 years ago by Learner 200
1

If the data has been submitted and made public, then yes, it is possible to download it from GEO (expression counts), or SRA and ENA (fastq files). You would need to know the accession.

ADD REPLYlink written 2.2 years ago by h.mon29k

@h.mon SRA is the same as FASTQ? how would you re_analyse it to get gene-level raw counts using FeatureCounts or Kallisto

ADD REPLYlink written 2.2 years ago by Learner 200

SRA is the same as FASTQ?

What do you mean?

SRA is a repository for publicly available NGS sequence data, normally in fastq format. You would download the data, scan/trim/align it to a reference genome using an aligner and then count the reads that are mapping to genes using featureCounts.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax78k

@genomax would it be possible to show me how to download the file? Actually I have been trying but with no success. For example when I go to here https://www.ncbi.nlm.nih.gov/Traces/study/ I cannot download any file. or If I go for example here ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP045%2FSRP045534 I cannot get any file either. It would be great if you could give me an example please

ADD REPLYlink written 2.2 years ago by Learner 200
1

Get the fastq files directly from ENA. Here is the page with the samples. See the FTP links for each sample or you could bulk download the files (bulk download button).

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax78k

@genomax for another data set, I am trying to get the FASTQ data but for some files there are links and for some files there are not. what should I understand from this? even not SRA link . Does it mean that the author did not upload the file?

ADD REPLYlink written 2.2 years ago by Learner 200

Sometimes the data is not yet public (pending publication) or if it is controlled then it will require you to apply for access. If you include SRA# I can take a look.

ADD REPLYlink written 2.2 years ago by genomax78k

@genomax OK, thanks

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Learner 200

It does look like some samples are missing fastq's. It is possible that submitter's created a manifest but the samples later failed so no data could be uploaded. Sometimes there are errors on SRA's part and they can fix them when you bring the errors to their attention. Email NCBI SRA support to ask.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax78k

@Kevin Blighe can you tell which method, package, algorithm or sofwtare is used to obtain FeatureCounts or Kallisto from FASTQ files?

ADD REPLYlink written 2.2 years ago by Learner 200
1

It will be hard to provide an example because your question lacks a lot of information and is ill-explained.

data obtained by Illumina HiSeq 2000 and saved as bedgraph.gz

Illumina HiSeq 2000 outputs fastq files, not bedgraph. You (or someone else) probably performed some unknown analyses to get from fastq to bedgraph, but we really don't know which analyses. Did you do analysis stuff? Please describe in more detail.

extracting the data for a specific gene

What is "the data"? Read counts? SNPs and / or indels? The gene sequence and / or upstream regions?

correlate it with a methylation data

What kind of methylation data? Maybe a bed or bedgraph of genomic methylated regions?

I have the fuzzy feeling you will want to look at bedops and bedtools.

ADD REPLYlink written 2.2 years ago by h.mon29k

extract the data

Perhaps we should have asked you this a long time ago. What exactly do you mean by that? Do you just want to extract rows from your bedgraph files that overlap genomic region(s) co-ordinates of interest (gene)? Or else describe what you need.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax78k

Hello Learner !

We believe that this post does not fit the main topic of this site.

No longer relevant and comments are nested too deeply

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Michael Dondrup47k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour