The Scientific Computing and Data Analysis survey results

Results

The SCDA user survey ran between November 1st and November 14th in 2021. Any OIST member was eligible to answer it. In total we got 121 responses; 105 in English and 16 in Japanese.

Jump directly to the following sections:

Respondents
Storage
Clusters
Service and Support
About SCDA

Respondents

Respondent roles at OIST

 

Researchers — students, interns, postdocs, staff scientists and unit leaders — are overrepresented in the survey compared to the overall demographics of our users.

Overall usage of our resources

 

All the major resources are about equally represented here. In reality the storage is used by far more people than any other resource. Cluster users are overrepresented in this survey. Please keep this in mind when looking through the results.

The Research Storage

We offer research storage for all research units at OIST. In addition we have fast cluster-specific storage for use during computations and specialized storage system for specific needs. You can find an overview on this page.

Frequency of use

 

Bucket is the most common storage system used at OIST. As mentioned in the first section, storage-only users are underrepresented in this survey so actual Bucket use is significantly higher than shown here.

Is the storage easy to access (Likert scale)

 

Is the storage performance sufficient (Likert scale)

 

Does the storage have sufficient capacity (Likert scale)

 

Respondents are largely happy with storage access and speed. A minority is unsatisfied with the overall capacity of the storage systems. "HPAcquire" is a transient system used to move data from one place to another, where capacity does not matter for its users.

User Comments

More space on bucket would be very useful.

More space on /bucket and /flash

more space on saion to store results from data without having to write to bucket everytime so that I can load trained neural network models faster for testing.

More bucket storage would be good. Perhaps groups with dna sequencing work could get an additional amount of space.

A dedicated method for transferring large files between labs might be useful.


Bucket storage can be increased as needed by request from the unit leader; for efficiency reasons we don't increase it until the unit has reached capacity. Also, we ask you to please spend some time cleaning up and archiving data in order to not waste a limited resource.

Flash and Work capacity can't be increased. They are designed with a specific per-unit capacity in mind. Please note that you are not supposed to keep any data on these systems when you are not running any computations. Always copy results to Bucket and clean up these systems after each use.

Internal and external data access is really two separate issues. We are currently (Dec. 2021) in the process of rolling out a new internal storage system for sharing data within and between units at OIST. This should solve most issues around internal sharing of data. External access is a difficult legal and technical security problem. We recommend that you use Filesender, Dropbox or your collaborators' resources for this.

The Compute Clusters

We provide two main clusters at OIST. The Deigo cluster (named after the Okinawan flower) is a large general-purpose cluster. The Saion cluster (a historical Okinawan leader) is a specialized cluster with GPU nodes, Power-CPU nodes and other specialized susbsystems.

Cluster use by partition. Compute and Short; Largemem and Bigmem; Largejob; and Datacp are part of Deigo. The remaining partitions are Saion.

 

Unsurprisingly, most users use the general Compute and Short partitions of Deigo. Fewer people use the GPU partitions on Saion, and the Largemem and Largejob partitions on Deigo for more specialized jobs. KNL is a legacy system that no longer sees much use.

How are people using the clusters (multiple answers). Dark blue options are more advanced, while light blue are less so. The red is survey respondents that don't use the clusters at all.

 

Users can use the clusters in many different ways. Only 30% write their own code to run on the cluster; most users use software written by somebody else. And 1/3 don't write any scripts to run jobs. A large number of cluster users are non-programmers and we expect this ratio to increase further in the future.

Cluster overall impressions (Likert scale)

 

Is the maximum job time sufficient (Likert scale)

 

Is the waiting time until a job starts sufficiently low (Likert scale)

 

Are CPU and memory allocations sufficient (Likert scale)

 

Users are happy with the resources and waiting times available on the Deigo cluster. This is no doubt in part due to the fact that the cluster is only 1.5 years old and not yet fully utilized.

The Saion GPU partitions are popular and are experiencing congestion at times. This is reflected in this survey, with over 20% of users being unhappy with the resources available there.

KNL had only 4 responses so the 20% dissatisfaction rate reflects the experience of one user. The KNL partition is by now a legacy system, and there is little reason to use it for anything.

User Comments

I'd love it if Deigo could have GPU nodes.

For some jobs, 20 days limit is marginal [...]

It would be nice if Intel CPUs are available for the Compute partition.

I would like to have more GPUs on Saion.

Is there any option to use V100 instead of P100?

Deigo will not have GPU nodes. But the in-cluster storage systems — Flash for Deigo and Work for Saion — are both available read-only from the other cluster, so it is possible to set up a hybrid job with one part running on one cluster, and another one running on the other. In the future we may look into a system where you can submit jobs to either cluster directly.

For very long-running jobs, you might want to look into the Largemem partition. However, you really should not run jobs for several weeks if you can at all avoid it. The risk is too great that something will happen and your job will die before it ever finishes. If possible, split up the job into separate parts and run them as separate jobs.

Long-running software often have a way to checkpoint their work; you can stop it partway through and it will pick up where it left off after you restart. Please make use of that to split long jobs into manageable parts.

Adding GPU nodes to Saion is partly a matter of having funding of course, but the power availabe in the current data center is another issue. GPU nodes are very energy hungry, and we are pushing the limits of the current facility. 

You can specify the GPU type on the gpu partition if you want. This will give you a V100 GPU:

--gres=gpu:V100:1

But note that in application testing we usually find only a very minor difference in total execution speed. Most jobs spend more time doing CPU work and transfering data than actually doing GPU compute. The compute kernels often also don't scale perfectly with the GPU.

Service and Support

SCDA offers a number of other services, including software, data visualization support and various ways to run computations remotely. We also offer support and training for the services we provide.

Usage of the services offered by SCDA

 

Usage of support for SCDA services

 

The most used service provided by SCDA is software, either on users desktops, or installed on the cluster. Other services are more specialized and less frequently needed. The online documentation and the HighSci portal are self-serve support options, and so more commonly used than in-person support though email or Open Hours.

Service evaluations (Likert scale). Note the small number of responses for many items.

 

Evaluation of support (Likert scale)

 

The virtual machines are regarded as difficult to request and to get help using. This is understandable, as we currently don't offer new VMs for users. We do recognize the need and we are looking at ways to will offer a more robust and accessible VM service in the future.

The users of the visualization service all feel it produces good quality results, all but one agree that it is timely and efficient. The difficulty in accessing it is due to it being offered by a single SCDA member that can sometimes be overwhelmed with requests.

The support resources are for the most part viewed positively by the users.

User Comments

I wish it was possible to get an OIST license for software our Unit uses a lot [...] Available software [...] is unusable and so our Unit has to spend our own budget on the tools we need.

More visibility of services such as Data visualization support

I would like a good search engine for the online documentation.

Softwares needs a regular updates

A lot of the documentation is out of date.

HighSci doesn't seem accessible from some web browsers, such as Brave web browser.

OIST buys site-wide software licenses for software requested and used by multiple units. Any software specific for one unit has to be purchased by that unit, same as any unit-specific research tools or materials. Even if you are buying software only for yourself, please do contact us; we can advice you on what license to get, and we have working relationships with many software vendors and may be able to help you get a good price.

We do need to improve visibility of our services and support material. This is something we have started to work on, but it will take some time. It is not easy to reach all researchers at OIST with information.

There are no doubt a large amount of issues and problems with our systems, with our documentation and with our support. To fix those issues, we first need to find out about them. Please contact us whenever you see an issue, and let us know about it!

 

Course evaluations (Likert scale)

 

 

Training course attendance was high; more than half have attended at least one course, and 20% have attended more than one. Users are also fairly satisfied with the courses on offer.

We offer the Introduction to HPC training course three times a year; around September-October, January-February and April-May. This is the main training we offer to get started with HPC at OIST.

In addition we try to offer at least one, sometimes two, other courses in the year. The subject and availability depends on user interest and on the time available for the SCDA staff to create and hold courses.

 

User Comments

Wider range of programming and data analysis courses accessible on a more regular basis.

Field-specific courses?

It would be great if these trainings could be early morning (8-10), or early evening (5-7).

We unfortunately have only a limited amount of hours available for developing and giving courses. And the staff can only teach courses on subjects they know something about, so most subject-specific courses really aren't feasible. Early morning or evening lectures are similarly not possible absent any staff members willing to work those hours of the day.

About SCDA

Finally we asked a few questions about what the users know about the SCDA section.

About SCDA (LIkert scale)

 

Most users do know that we are part of the Research Support Division, and not part of the IT section at OIST. Most are also aware that they need to acknowledge us in their research outputs; acknowledgements form part of our evaluation and affects the funding we are able to get. Finally, we're happy to see that most users find us approachable for help and advice.

User Comments

It's great but it would be helpful to have a wider range of support for our daily technical issues, such as
1) support for programming experiments,
2) support for engineering issues (Arduino),
3) support for specific data analysis (physiological data) and
4) support for statistical analysis.

I think the community would benefit from an increased number of staff employed for data visualization.

SCDA is highly valuable for our research and it should probably expand in size (number of staff) as OIST is growing.

I want to thank you in the acknowledgment section of my paper, but my recent paper got rejected! So, yeah, I want to thank you in the next submission.

Excellent Work overall. Very useful section with sometimes onerous rules that can make it hard to do new things.

 

As with any resource, we can only employ the amount of staff that we have funding for. And please don't disregard the other RSD sections; they are highly competent and able to help with a wide range of issues as well.

We would finally like to thank the survey respondents for the effort, and for all the positive and helpful comments we received (we can show only a small subset here).

Contact us

If you have any questions or issues with the survey, please send an email to jan.moren@oist.jp. You can also open a help ticket or send an email to ask-scda@oist.jp and it will get to us.