This document outlines the requirements and instructions for setting up a StorReduce server in an on-premises setting.
Contact us if you have questions
If you have any problems or questions setting up your StorReduce server, please contact us - we’ll be happy to help out!
Live on-line chat: use the chat button (bottom right)
The StorReduce server software requires a relatively modern x86 server, physical or virtual, with the following specifications:
|CPU||64 bit, 2+ GHz, 4 cores||Additional cores will usually increase throughput, up to 32 or even 64 cores|
|RAM||16 GB||Additional RAM is used to cache index data|
|Storage||1 boot disk
1 magnetic disk
1 or more SSDs
|Additional SSDs should be combined using RAID 0 to increase index performance - see 'SSD Sizing' below|
|Network||1 Gb Ethernet||10Gb Ethernet is required for higher throughputs|
A server with with 4 cores and a reasonably fast SSD will process data at up to 83 MB/s.
A server with 32 cores, 60GB of memory, 10 GB ethernet and 2 SSDs will process data up to 900 MB/s (tested on Amazon EC2 using a c3.8xlarge instance).
Actual speeds will vary depending on network speed, SSD speed, virtual infrastructure and the deduplication ratio achieved. To achieve maximum throughput it is necessary to use multiple S3 clients uploading data simultaneously, or a heavily multithreaded S3 client.
See the StorReduce FAQ for more information on expected throughput for different numbers of CPU cores and RAM.
SSD and Disk Sizing
StorReduce requires two disks to store the deduplication index:
The high iops disk requires fast random read access and therefore should be placed on SSD storage (unless the entire index fits in RAM - see the ‘Using StorReduce Without SSD’ section below).
The standard disk does not require fast random read access and so magnetic disk can be used.
The amount of un-deduplicated data that a StorReduce instance can store is determined by the space available on these disks and the deduplication ratio.
If more than one SSD is provided then they should be combined into a single volume using RAID 0 (see details on how to do this below).
The high iops disk (SSD) size is dependant on the expected deduplication ratio:
SSD to Data Ratio
SSD Size (10TB data)
SSD Size (100TB data)
SSD Size (1PB data)
|97%||0.014%||1.4 GB||14 GB||140 GB|
|95%||0.025%||2.5 GB||25 GB||250 GB|
|90%||0.045%||4.5 GB||45 GB||450 GB|
|80%||0.06%||6 GB||60 GB||600 GB|
The standard disk size is not dependent on the expected deduplication ratio and is determined purely by the amount of data to be stored (before deduplication):
Std Disk to Data Ratio
Std Disk Size (10TB data)
Std Disk Size (100TB data)
Std Disk Size (1PB data)
|0.035%||3.5 GB||35 GB||350 GB|
Using StorReduce without SSD
For small amounts of data magnetic disk can be used for both disks since the index information will be cached in memory, providing the fast random read access required. In this case you should have enough RAM in the VM to cache both index disks in RAM.
The following table shows the amount of un-deduplicated data that can be managed with high throughput without an SSD:
RAM to Data Ratio
Data Capacity (16GB RAM)
Data Capacity (32GB RAM)
Data Capacity (64GB RAM)
|97%||0.049%||30 TB||60 TB||120 TB|
|95%||0.060%||25 TB||50 TB||100 TB|
|90%||0.080%||19 TB||38 TB||76 TB|
|80%||0.095%||16 TB||32 TB||64 TB|
The StorReduce server is available as a Virtual Appliance (OVA file) which can be run on VMware or other compatible virtual infrastructure.
For most users we recommend running the StorReduce virtual appliance.
StorReduce is also available as a set of RPMs to install directly onto Linux. This can be used to set up a StorReduce instance on a physical server or a particular Linux distrubution. Please contact StorReduce for more information about how to manually install and configure StorReduce on Linux.
Data is stored and retrieved from StorReduce via the S3 API so you will need client software that supports the S3 API. We recommend S3Browser on Windows and AWS CLI or S3Cmd on OS/X Unix, but there are many other tools.
Any machines running S3 client applications talking to the StorReduce server must be able to resolve the hostname of the server, either through DNS or by adding an entry to /etc/hosts on the client machine. This hostname must match one of the ‘hostname’ values configured in the Settings tab of the StorReduce dashboard.
Although you can use the IP address for the VM to browse to the Web console, this will not work for clients connecting via the S3 API. (S3 API digital signatures do not work when the endpoint is specified as an IP address.)
In addition, if your S3 client application uses virtual host style bucket addressing (see http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html) then each bucket created in the StorReduce server must also have a DNS or /etc/hosts entry. The hostname for the bucket is the bucket name followed by a dot and the server hostname. For example, if -hostname=example.com and you create a bucket called my-bucket then my-bucket.example.com must be mapped to the server’s IP address in DNS or /etc/hosts.
Admin Web Interface and S3 interface ports
Once the server is running you will be able to access the Admin interface by pointing your browser at https://YOUR_HOSTNAME:8080/ and the S3 API will be accessible on ports 80 and 443. The default StorReduce username is root and the default password is storreduce.
The admin interface is a Web GUI that allows you to browse the data stored by the server, manage users, perform limited configuration, and perform maintenance tasks. See http://storreduce.com/docs/quick-start/try/ for more details.
The admin interface is also available on ports 80 and 443, under https://YOUR_HOSTNAME/storreduce-dashboard.
Object Storage Requirements
If you are testing purely to determine the deduplication ratio of your data then StorReduce can be run without an object store; this mode is known as the ‘StorReduce estimator’. In this mode no data is stored to the cloud as data is uploaded.
For normal operation you will need an object store. At this time we require that the object store have either an Amazon S3 API (e.g., Amazon S3, S3 IA, Google Cloud Storage, Google Nearline, Ceph with S3 API, Openstack Swift with S3 API option, Cloudian) or an Azure Blob Storage interface.
To install a StorReduce server for production, please ensure you have read the rest of this document, then follow these steps:
Ensure you have download the latest StorReduce OVA file, and that you have a license from StorReduce for your new server. Please contact firstname.lastname@example.org for links and license information.
Import the OVA file into VMware ESXi, Workstation or Fusion.
Set the sizes for the local disks to match expected data volume - see the ‘SSD and Disk Sizing’ section above. For detailed instructions please see the Local Disk Configuration and Resizing guide.
We recommend that you select ‘thin provisioning’ for disks.
Ensure that you have network connectivity to the server on ports 22, 80 and 443, and optionally port 8080 can be used for the dashboard (admin) interface.
Set up a DNS name or hosts file entry to refer to the StorReduce server VM (see the DNS Configuration section above).
Log in to the Virtual Machine and resize the file systems on the local disks if non-default sizes were used. See the Local Disk Configuration and Resizing guide for instructions. The initial root password for the VM will be ‘storreduce’.
Configure the StorReduce server as specified in the On-Premises Server Quick Start Guide.
Important Note: Upgrading your server to the latest version
When evaluating StorReduce we recommend upgrading to the latest version by running ‘yum upgrade’ before performing the evalution. For production servers we recommend running ‘yum upgrade’ once a week. See In-place Upgrades using Yum for details.
See the StorReduce Try It Out page for suggestions on how to test StorReduce.
Important Note: Encryption and Compression
Encrypted or compressed data contains no duplicates and therefore will not deduplicate. When picking test data for evaluation there must be a reasonable expectation that the data contains duplication, e.g. for backups try testing with 10 sequential backups of the same server or set of servers, not backups of 10 different servers.