January 2020: Globus Turns 10...18
January 15, 2020 | Ian Foster
Reflections from Globus Co-Founder Ian Foster
2020 marks the 10-year anniversary of the Globus service as we know it today. It also promises to be a monumental year for the product and team, for many reasons, not least because we’ll likely reach one exabyte (10^18 bytes) transferred as Globus usage continues to grow. I want to take a moment to reflect on how far we’ve come, and where we plan to go in 2020 and beyond.
Missing Steve
I must first take a moment to honor my longtime colleague and friend, Steve Tuecke, who passed away on November 2, 2019. As I discussed in my blog post, Steve’s contributions to the practice of research computing were many, varied, and significant. For many of us, his contributions to our lives as a loyal and caring friend were yet more valuable.
If you have memories of Steve you’d like to share, you can do so here. Many thanks to those who have already done so.
Happy 10th Birthday!
While the name “Globus” dates back to a DARPA grant in 1995, it was at Supercomputing 2010 in New Orleans that we launched the “Globus Online” service: The first software-as-a-service (SaaS) version of Globus aimed at radically simplifying file transfer. Back then, our focus was on offering a hosted service to address challenging transfer tasks (e.g., transfer management, credential management, recovery from transient errors) via a modern Web 2.0 interface.
As we enter our 10th year, it’s nice to look back and see what we’ve achieved since then:
- Grown from ~120 initial users at SC10 to ~120,000 today
- Added over 150 institutions as subscribers, in pursuit of our goal to become self-sustaining; approximately half of our funding now comes from our generous subscribers – thank you!
- Moved 790+ PB and over 100 billion files among storage systems at thousands of institutions in 80+ countries
- Expanded the service far beyond transfer – now offering data sharing, a robust management console for administrators, connectors to premium cloud and on-prem storage systems, support for protected or regulated data, and more
- Introduced the Globus platform, via APIs and SDKs that enable many communities to build robust, secure research data management portals and science gateways
- Become central to data collection, sharing, and distribution solutions used in some of the world’s biggest research laboratories and science projects
2019: The Year in Review
Over the past year, Globus has continued to make great strides. We saw the largest-ever transfer in our history, when scientists moved nearly three petabytes of data as part of a research project involving three of the largest cosmological simulations to date. This project was named “Best Use of HPC in Physical Sciences” by HPCwire readers at SC19.
In 2019, 45 organizations joined the ranks of Globus subscribers or upgraded their subscriptions. Many thanks to all those Globus users out there who choose to subscribe – that’s how we are able to continue providing the data management services Globus is known for.
The product team has been busy as always. We released two important new cloud connectors: our Box connector enables users of this popular cloud storage system to access and manage Box files using Globus, with no restrictions on file size download, and our Google Cloud connector allows Google Cloud users to join the growing Globus ecosystem. In more connector news, Caringo and Wasabi became inaugural partners in our new program to validate storage systems compatible with our S3 connector</a>.
Last year we released a new web app, offering a responsive design, customizable display options, and the ability to meet accessibility standards. We also welcomed our first adopters of the new high assurance product for protected data, such as data protected by GDPR and HIPAA. It’s exciting to see medical centers starting to use Globus to share protected data, and thus know that we are contributing to medical discovery.
As many of you are aware, we began development of the next-generation Globus Connect Server (GCS), GCS version 5, with key enhancements needed to support yet more powerful data management capabilities. The year saw several point releases of GCS v5, progressively adding new features such as HTTPS support and improving the user experience with connectors.
One more product item you may have missed: we took a first step towards delivering group management as a platform service, to meet authorization needs, and released an initial version of a generally available API.
2019 was a great year for events. We broke our 2018 record for most GlobusWorld Tours, hosting nine of these free user workshops across the US, and also in the UK and South Africa, where we see great demand for advanced cyberinfrastructure development.
We also helped launch the first-ever Data Mobility Exhibition in partnership with ESnet and Indiana University. The event runs through August 2020 and enables participants to improve their capabilities for fast, secure file transfer at their institution.
Our outreach team also launched a Welcome Kit of resources to help new subscribers get started quickly with all the varied features that a Globus subscription brings.
What’s Next
This year, we’ll witness the push to complete the first U.S. exascale system (likely in 2021), Aurora, close to home at Argonne National Laboratory. Exascale brings far greater capacity for handling the massive datasets generated by, among other things, AI workflows and increasing amounts of sensor data. Tools like Globus, which help researchers make use of all that data, need to be ready.
A key area of focus for the team this year will be delivering platform capabilities to meet the needs of data handling for scientific instruments, data providers, repositories, and science gateways. We plan to expand the group management API and to provide blueprints for integrating our end-to-end data management platform with your custom applications. A significant addition to our platform services will be automation capabilities that provide task orchestration and mechanisms to build out custom pipelines, such as data publication flows and data processing pipelines.
We will continue to build out Globus Connect Server v5, incorporating enhancements such as multiple storage types in a single deployment, complete backup and restore capability for installations, and HTTPS access to data with the same security enforced and use of modern security standards for server security that we provide for GridFTP. This new version will go into full production this year, with support for migration from older versions.
Of course, we will continue to add features to our existing product line, such as simple browser-based access to data on Globus endpoints, plus a streamlined experience for Globus Connect Personal users. We’ll also roll out new onboarding and outreach programs to help subscribers grow usage within their institutions, and thus maximize the value of their investment. And our continued GlobusWorld Tour events, plus conference exhibits and demos, will ensure users have many opportunities to interact with the latest Globus tools – and our expert team – on a regular basis.
Hope to see you all at GlobusWorld 2020 on April 29 and 30, in Chicago, where we can discuss all this and more. And one more thing: If you’re working on something interesting with Globus please share with the community by submitting a talk for GlobusWorld here.