Faculty Disk-Based Storage System

Summary of main Features

Pound Sign Relatively cheap (about 60 per TB for 5 years read/write access followed by indefinite, read-only archiving. The actual price varies depending on the current cost of disks - it can go up or down).
Large volume Large-volume capacity on a managed system. (Individual research groups use single TBs to hundreds of TBs. Individual requests up to about 50TBs are fine - contact IT support to discuss anything larger).
Backups Backups - a single mirror with at least 30 days of changes ( see below for details).
Performance Performance - the storage is on direct-attached SATA disks, with an SSD read cache, served from 10Gbps networked servers.
Disaster Recovery Disaster recovery - in the event of a major incident, it could take some time to restore all data from backups (possibly several weeks).
Security Security - the servers are housed in secure Faculty or University data centers and are rack-based. The system is behind the institutional firewall and has normal password security applied.
Network Accessible as a network drive from all Faculty Windows and Linux desktops, and from Desktop Anywhere.

A Managed Storage System

Calleo Server
The Faculty offers a managed storage system for large volume research data. The system is classed as manged because Faculty IT carries out all the background activities associated with data storage:

  • purchasing the raw hardware
  • setting up filesystems and making them available on the network
  • backing up the filesystems
  • monitoring and replacing faulty hard drives
  • resizing filesystems if more space is required in the future
  • maintaining the servers
  • etc.

Enterprise vs Non-Enterprise

There are different classes of managed storage systems - scratch, Enterprise, non-Enterprise, etc. The University Policy on safeguarding data should be applied when making decisions on which type of system to use for storing data. Enterprise systems feature criteria such as performance, resilience, high availability and comprehensive backups. The Faculty system is a managed, non-Enterprise system. Enterprise storage for research data is available on the University N Drive - but this is relatively small volume (gigabytes rather than terabytes).

Data Security

The University Policy on safeguarding data should be applied when making decisions on data security. The Faculty servers are housed in secure Faculty or University data centers and are rack-based. The data centers have features such as intruder alarms, fire suppression, water detection, etc. The system is behind the institutional firewall and has normal password security applied. Data encryption is not applied by default.

Funding and Data Lifetime

The funding for the Faculty storage system comes from individual research grants - the Faculty buys and maintains large servers as part of the central Faculty infrastructure, and research groups purchase the disks for these servers (including mirrors, backups, redundancy, etc) from grants. The cost to research groups changes with disk prices, but is currently around 60 per TB (depending on the backup policy - see below) for 5 years. After 5 years, data on Faculty storage systems will move to read-only data repositories which are maintained as part of the over-all Faculty system. No time limit has been set on the lifetime of the repositories - that may be influenced by future University systems. Research groups should consider the long-term archive requirements of their data.

Storage space can be purchased by contacting Faculty IT staff ( foe-support@leeds.ac.uk). Note that requests for storage up to about 50TB can follow the funding model above, but anything larger will require individual discussion with the Faculty IT team.

Partition size

Each server in the system has a RAID data array which is split into group quotas. The minimum preferred size which the Faculty sells to projects is 2TB - although smaller partitions may be possible (contact IT support). The maximum size of a single partition is currently around 290TB - this is set by the maximum size of the data array on a single server (so it could increase in the future).

Backup Policy

The RAID arrays are constantly monitored and can recover from individual disk failures. Current systems run RAID6 - so at least 3 disks have to fail simultaneously before a filesystem is lost. We run 2 levels of backup - the correct level should be decided for each filesystem by the PI responsible for the data (with reference to the Policy on safeguarding data).

  1. Scratch space - In this case, the live data is the only copy. There are no backups at all, and there's no possibility of recovering data from failed filesystems. Filesystems in scratch areas will always have the word scratch in their name. There's a possibility of losing data via user errors (if a user accidentally deletes or overwrites files), there's also a possibility of losing entire filesystems if enough disks in the live RAID array fail simultaneously, or if the array is affected by fire, theft, etc. For this level of data protection, we charge 46 per TB for 5 years.
  2. Mirrored data with increments - In this case, the live data is mirrored to another RAID array in a separate fileserver in a separate server room. The mirror is synchronised overnight and all files which are changed or deleted during the synchronisation are kept. These incremental changes are kept for at least 30 days, but are kept for longer if disk capacity on the backups allows. This protects against disasters such as theft, flooding, etc in the server room (at the worst, 24 hours of work could be lost). It also protects against a critical number of disks failing in the RAID array. It also gives protection against user errors - files which were deleted or changed for at least the last 30 days can be restored. For this level of data protection, we charge 60 per TB for 5 years.


Topic revision: r25 - 06 Nov 2018 - StuartBorthwick1
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.