Have you heard of Iplant Collaborative? If not, you should click here and check out the project and the team asap. Iplant is an innovative platform that aims to connect scientists to applications, tools, datasets, storage, and an atmosphere (aka-a ‘cloud’). In fact, their tagline is, “it is where scientists in all domains of life sciences can connect to public datasets, manage and store their own data and experiments, access high-performance computing, and share results with colleagues.” Most importantly-it is open-source. In other words, it is free for academic scientists to use –of course with appropriate credits.
I attended a workshop at UC Davis last week to find out more about Iplant and what it has to offer in genomic resources. Prior to this workshop, I did not know too much about the program. In fact, I only knew that we were going to store our datasets from our Anthozoan UCE project on their servers. In all honesty, and most likely because of the name, I thought it was a program focused on sequencing genomes of plants, and providing resources to botanists. Of course, I could not have been more wrong. Although I should mention that there is a genomic bias to Iplant, you do not necessarily have to work with genomic data to utilize Iplant Collaborative.
There are many benefits to signing up for an Iplant account. Firstly, you can upload and store large datasets. This system provides a pretty nice backup (100 GB or more if requested) for any data that you may store locally (i.e., on your own laptop or desktop). Secondly, you can invite your colleagues to sign up for their own accounts. And guess what? You can share your data with your colleagues and vice versa, or with the broader scientific community. Thirdly, you can explore and use applications that have been added to the website by the team. Have you ever been frustrated because you cannot correctly compile a program on your computer? Is BLAST too slow on your laptop? Is RAxML too slow on your local computer? Do you have multiple copies of the same program in 10 places on your computer? Iplant likely has your application of interest and you can run it on their server, in what is known as the Discovery Environment (DE). And if they do not have a particular application of your interest, you can contact them to add it. This means if you want to use a certain program, such as RAxML, it may be much easier to do so on their DE. Also, they have decent processors (16CPU, 128 GB RAM) and you can even make your own workflows that link multiple programs (i.e., MAFFT to RAxML). This means that you can run a workflow in one click. Also, it seems like a great teaching tool for both undergraduate and graduate courses.
Iplant is pretty outstanding. Here is some advice from me that I gathered from this past week’s workshop: If you work with genomic data or in fact ANY large datasets, you should create an account, log in, and explore the applications. If you are ever frustrated about sharing large files with your colleagues, you should create an account, log in, store your data, invite your colleagues, and share those data with them. If you are intimidated by using bash scripts, python, R, or even the command line in general-you should create an Iplant account and check out what they have to offer. But in the end, I personally think that while familiarizing yourself with Iplant and their resources, you may want to take a course and start learning how to use the command line. (Check out datacarpentry.org). In fact, learn how to program in any language (codecademy has great resources for beginners).
The workshop last week was very useful. I am thankful to know that if I hit a roadblock, I can jump on Iplant to see if their resources can improve my workflow. Additionally, I will definitely use it to store data and for teaching purposes in the future. Iplant Collaborative is a great resource, run by a great team, and they are sharing it with others…for free.
UCE Project Team
All things Anthozoa, Evolution and Ecology
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation