Final report of my GSOC’17 project with Common Workflow Language Organization
Official coding period for GSOC’17 had just ended and its time to sum up my work and present a final report on it.
Aim of our project was to make cwltool and schema-salad windows compatible and work on other bugs and features. Common Workflow Language is a specification which is used to describe command line tools and workflows, providing benefits like flexibility, scalability and portability. Cwltool is a reference implementation of the Common Workflow Language. It is intended to feature complete and provide comprehensive validation of CWL files as well as provide other tools related to working with CWL. Salad is a Apache Avro based schema language for describing JSON or YAML structured linked data documents. Cwltool depends on schema-salad for object creation, reference resolution and validation of CWL files. Since workflows and tool definitions can use unix tools, we decided to allow workflow execution inside a docker container when working with windows operating system.
We started our work with making schema-salad compatible with windows operating system. here is the merged PR. After that we started working on windows compatible cwltool. Some of the issues that we came across were unsupported scheme, Symliks on windows and windows path separator related issues. We choose appeveyor CI for testing cwltool implementation on windows OS. Once we passed all units tests on windows, we started working on passing conformance tests on windows and ensuring docker support for cwltool on windows. After resolving some time consuming issues like Non blocking I/O operation
and default docker container on windows
, we achieved windows compatibility for cwltool. Here is the merged PR.
After the windows compatible cwltool, I worked on the following bugs and features:
- Python 3 support on Windows: Once we made cwltool windows os compatible we found that it is having some issues with python 3 on windows. We fixed those errors and here is the final PR.
- Adding documentation file for windows compatibility: Adding documentation for windows users of cwltool.
- Allowing Http/Https files as input: Earlier we used to load workflows over http but input files were still needed to be present locally. In this PR we added a feature to load inputs over http. File caching is used to avoid downloading files again. Here is the PR.
- Adding Testsuite to cwltest: Cwltest repository is lacking a test suite to make sure that any new PR do not break the codebase. This PR aims at adding a test suite to the cwltest repo and is currently work-in-progress. see PR
- Adding –docker-pull flag to force pull latest docker image: We added a feature to force pull latest docker image mentioned with dockerpull variable even if a image is locally present. We can do this using a –docker-pull command line argument.
- Using stdout field in cachekey calculation: Taking account of stdout field while calculating cachekey for better cache results.
- Removing unnecessary warning due to generation field: Due to regression, unnecessary warning was being generated which we fixed in this PR.
- [Future module] Harcoded tmp folder prevent windows compatibilty of past module: Since we are using
past module (part of future)
to runavro-cwl
on python 3. This module has some hardcoded paths and is not compatible with windows OS. We made a PR to fix that.
Some minor PR’s like avoiding use of exception.message in python 3
see [here], fixing Nonetype not iterable error
[here], Adding Build Badges to cwltool and schema-salad
[here] and adding warning when default docker container is used
refer [here] were also made.
I would like to thank my mentors Anton Khodak, Janneke van der Zwaan and Michael R. Crusoe for being cool and helping me on almost every step of this project. Also a big thanks to Peter Amstutz for his constant help. It was a great experience, working with all of you.
Overall, we met all the requirements and I consider it to be a successfull GSOC . I hope my small contribution will help some people :)