Question: There is some way to split a FastQ file by chromosome?
0
gravatar for calebe_campos26
6 weeks ago by
calebe_campos260 wrote:

Hi folks,

I'm diving in this world now (so I'm a newbie) and I have a big issue to resolve. I need the FastQ file of each bovine (Bos taurus taurus) chromosome. In NCBI I can only download each FastA file splitted by chromosome and I don't know other database with this especific file. Two options comes to my mind:

1º - Convert the FastA to FastQ, but I read that some dates in the FastQ file doesn't exist in the FastA file and the sites and softwares can't convert my files because they are too heavy (about 100Mb).

2° - Download the SRA file from NCBI, convert to FastQ and than split (?) by chromosome, but how could I do it?

Regars from someone lost. =]

chromosome fastq sra fasta • 170 views
ADD COMMENTlink modified 6 weeks ago by swbarnes27.2k • written 6 weeks ago by calebe_campos260
2

1 - You can't convert FASTA to FASTQ. They don't contain the same information.

2 - If you download from SRA, that should be in FASTQ format already, but is likely not segregated by chromosome (but eukaryotes are not an area of expertise for me).

One option would be to download the reference genome, and the reads from SRA, then map the reads to the reference yourself.

Why do you need FASTQ specifically?

ADD REPLYlink written 6 weeks ago by Joe16k

At the first moment, I was right that the file type from SRA was FastQ, but I read in some tutorial that I' d convert a SRA type file to FastQ. I made some confusion about it, lets ignore this part so.

ADD REPLYlink written 6 weeks ago by calebe_campos260
2

I need the FastQ file of each bovine (Bos taurus taurus) chromosome.

Can you tell us why? As it stands this request does not make logical sense.

ADD REPLYlink written 6 weeks ago by genomax76k

I'm using a software (a site, to be more specific) called RepeatExplorer and the input file required is a FastQ file.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by calebe_campos260
2

Are you looking for specific repeats or just a file with masked repeats? You can download download masked genome files for cow genome from UCSC.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by genomax76k

I'm drawing a probe to paint the whole chromosome. The software (RepeatExplorer) I'm using 'll provide me some sequences to use as my "primer" and the sequence required by the software is a FastQ file. How I want do the probes to each chromosome, so I need the FastQ file from each chromosome.

Feel free to sugest another way to get it. =)

Before I Forget, thank y'all for the help.

ADD REPLYlink written 6 weeks ago by calebe_campos260
1

While one could create a fake fastq file easily, I don't think the tool you are looking at will accept that file. It is meant to be used with short next generation sequencing reads and not entire chromosomes.

ADD REPLYlink written 6 weeks ago by genomax76k

But could I create a fake long FastQ file?. If so, I could try.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by calebe_campos260
1

You can, but you shouldn't.

It wouldn't surprise me if the tool rejects that anyway, as read lengths are short.

ADD REPLYlink written 6 weeks ago by Joe16k
1

Can you provide a link to the resource and point out where it specifically asks for FASTQ? That's a pretty unlikely filetype requirement for a tool which is in theory just looking for repeats.

ADD REPLYlink written 6 weeks ago by Joe16k
1

Tool OP is referring to is for NGS data (not genomes). It can be found here.

ADD REPLYlink written 6 weeks ago by genomax76k
1

Hmm.. Strange. I would personally have thought that short reads are not ideal for finding these types of features without first doing some assembly.

ADD REPLYlink written 6 weeks ago by Joe16k
1
gravatar for swbarnes2
6 weeks ago by
swbarnes27.2k
United States
swbarnes27.2k wrote:

Convert the FastA to FastQ,

A fastq contains more information than a fasta. You can't "convert" your way into more information than you started with!

In general, no one stores giant long nucleotides with individual base quality scores. What you are asking for almost certainly does not exist.

You need to stop and think about what you are asking, and what you really need to do, because I bet you don't really need a giant fastq.

ADD COMMENTlink written 6 weeks ago by swbarnes27.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2142 users visited in the last hour