Hi,
This blog post summarizes the main tasks that I have done during these 3 months as a GSoC’16 Intern, and the things I have learned along with that.
I have been working on the Ecodata Retriever project, with my mentors Henry Senyondo and Ethan White under Numfocus.
All the commits I have made during this period are listed here on this Github link:
https://github.com/weecology/retriever/commits/master?author=goelakash
Note: All the code has been merged to master
branch.
Lists of tasks:
a) Upgrade scripts to Datapackage.JSON standard.
This was my main GSoC task, that I spent most of the last 6 weeks in, that includes code and documentation.
Summary 1:
- The scripts have been updated from
.script
format to.json
using theparse_script_to_json
module I wrote. - A new CLI (command-line-interface) tool has been added by me, that can:
- Create new JSON scripts: Takes input for all the relevant fields from the user, validates the input, and stores them in valid JSON format (Datapackage.JSON standard).
- Delete JSON scripts: Deletes any script based on the script’s shortname. Searches the list of python scripts (
SCRIPT_LIST
) and deletes the scripts that match the users requirement after confirming. - (Experimental) Edit JSON scripts: This feature has not been completely tested, so currently disabled. This allows users to edit existing
retriever
scripts. - Added unit-tests and modified integration tests to test input validation and JSON script integration (download and installation regression tests).
- Added documentation (link) to guide the user on this new tool.
b) Port retriever
to Python 3, maintaining backwards compatibility.
Not a cakewalk at all. I already highlighted the various csv
and encoding issues (UTF-8
/ latin-1
) in the previous post. But nevertheless, the library is now fully compatible, both on Python 2 and 3, on all major *NIX and Windows platforms (tested on Ubuntu, Mac, Windows 7).
I completed this in the first month of the GSoC period, and have been adding fixes related to all the bugs that came up during the rest of the coding period. I refactored the code so that there is no more need for explicit OS checks, thanks to help from my mentors Henry and Ethan.
Summary 2:
retriever
can now be installed in either Python 2 or Python 3, without any difficulties.- Cross-platform compatiblity (with python 2 and 3 both).
- Updated documentation(link) to reflect Python 3 support.
Learnings:
1. Python idioms
2. Unit testing (with pytest
)
3. Different types of unicode encodings (UTF-8
and ISO 8859-1
)
4. sphinx
documentation system
5. git
-fu!
6. Python 2 vs python 3 – syntax and package-support differences.
In closing, it was an immensely rewarding learning experience, and I look forward to remain associated with the retriever
project 😀
Thanks for reading!