Just browse sra-explorer.info for datasets. I doubt you can meaningfully query for file size as this depends on coverage and read length. Just download any from the species you need and then downsample. Just use head on the fastq files to get like the first 4000000 lines which equals 1mio reads. Reads in fastq files are random and not ordered other than its position on the flow cell, hence you do not strictly need a dedicated sampling approach, especially if this is just for some practicing task.