Python-bloggers

A multi-language overview on how to organise your research project code and documents

This article was first published on python | Victor Boussange , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

I personally find that one of the biggest challenge when doing research is to keep things neat and organized. Having a good management system for your code and resources is key to optimizing time and brain resources. In this post, I discuss various methods for structuring a research project folder that includes code, data, publications, and more. Additionally, I discuss the specifics of organizing your research code. As I started my PhD, I wish I could have had some of such guidelines. But starting from scratch allowed me to build, with trials and errors, a good system for my later life. Hopefully, some of this can apply to you!

This post is part of a series of posts on best practices for managing research code. Much of this material was developed in collaboration with Mauro Werder as part of the Course On Reproducible Research, Data Pipelines, and Scientific Computing (CORDS). If you have experiences to share or spot any errors, please reach out!

Content

Project folder structures

I quite like this project folder structure, which keeps apart raw data and results from the code, but still place them relatively close, together with admin and publications. Having a separate git repo for the paper is something I would recommend as well (possibly linked to an Overleaf project).

|-- code/
|-- data/
|-- results
|-- publications
|    |-- talks
|    |-- posters
|    |-- papers
|-- admin
|-- meetings
|-- more-folders
 -- README.md

You may want to place results within code, together with data (which you should not git track)
The structure of code/ deserves here some attention.

code/ structure

Programming languages typically have their own conventions, but often the folders follow this scheme

It can make sense for research projects to distinguish between scripts placed in scripts/ and reused functions, models, etc., placed in src.

Python Folder structure
|-- src/            # package code
|-- scripts/        # Custom analysis or processing scripts
|-- tests/
|-- examples/       # Example scripts using the package
|-- docs/           # documentation
 -- environment.yml # to handle project dependencies
 -- README.md
R Folder structure
|-- R/               # R scripts and functions (package code)
|-- scripts/         # Custom analysis or processing scripts
|-- man/             # Documentation files
|-- tests/
|-- examples/        # Example scripts using the package
|-- vignettes/       # Long-form documentation
 -- DESCRIPTION      # Package description and metadata
 -- NAMESPACE        # Namespace file for package
 -- README.md        # Project overview and details
Julia Folder structure
|-- src/            # package code
|-- scripts/        # Custom analysis or processing scripts
|-- test/
|-- examples/       # Example scripts using the package
|-- docs/           # documentation
 -- Project.toml    # to handle project dependencies
 -- README.md

Turning your code/ into a “package”

You may want to specify the src folder as a package. This has a few advantages, including

To import functions and classes (types) located in the src folder, you typically need to indicate in each script the relative path of src. In Julia, you would typically do something like include("../src/path/to/your/src_file.jl"). In Python, you would do something like:

import sys
sys.path.append("../src/")

from src.path.to.your.src_file import my_fun

If src/ directory grows, it’s beneficial to convert it into a separate package. Although this process is a bit more complex, it eliminates the need for path specifications, simplifies the import of functions and classes, and makes the codebase easily accessible for other research projects.

There are typically ways to turn a code-project into an installable package. This is in particular useful for code which other people (or yourself) use for different projects.

You can achieve this easily with development tools.

For Python, tools like setuptools and poetry facilitate package development. If you’re working in R, devtools is the go-to tool for developing packages. In Julia, the Pkg tool serves a similar purpose.

Package templates can be useful to simplify the creation of packages by generating package skeletons. In Python, checkout out cookiecutter. In R, check usethis. For Julia, use the Pkg.generate() built-in functionality, or the more advanced PkgTemplates.jl package.

Note that you may want at some point to locate your src/ (and associated tests, docs, etc…) in a separate git repo.

Further reading for

Wrapping up

Explore these exemplary toy research repositories in different programming languages:

These repositories showcase what I consider to be best practices in research project organization.

Take-home messages

To leave a comment for the author, please follow the link and comment on their blog: python | Victor Boussange .

Want to share your content on python-bloggers? click here.
Exit mobile version