Hi,
I am interested in loading the single cell fastq files deposited in
https://www.ncbi.nlm.nih.gov/sra?LinkName=biosample_sra&from_uid=14485867
for one sample "young_A_HSPC" there are 4x SRA runs
I downloaded these SRAs with prefetch and then I ran
fastq-dump --outdir fastq --gzip --skip-technical --readids --read-filter pass --dumpbase --split-3 --clip SRA*
I got 4x fastq files with equal size. However, I need the following 4 fastq files to run cellranger
I1: Sample index read (optional)
I2: Sample index read (optional)
R1: Read 1
R2: Read 2
I do not know what the .fastq files are that I have now. Since they are all 2.2GB I do not think that two of them are the I1 and I2.
How do I proceed?
OK, I repeated the fastq-dump step for one of the 4 runs with the command "fastq-dump --split-files". I am getting 3xfastq files now. When I check the head, I see one fastq file has 8nt reads (I guess this is the index), the second has 26 and the 3. has 57. I assume I have several runs because the samples where run on different lanes?
You may have only index because the samples may have been independently submitted under separate SRR#.
THANKS A LOT!!! how do you know what is R1 and R2? Is R1 always longer? It seems I have only one index ...
R1 = UMI+ Cell barcodes
R2 = Actual cDNA read
So R2 will always be longer than R1.
You may have only index because the samples may have been independently submitted under separate SRR#.
If you look under the
DataAccess
tab for each sample (one example) you will see the original data submitted. Looks like this submission had fastq files. Sometime people submit BAM file from cellranger. It is a bit of wild-west as far as 10x data submission go in SRA.