A few steps to cleaner, better-organized code

This post covers five preliminary topics for scientific programming. Take what’s useful!

File organization structure




Version control

Data and Metadata

General metadata:

Metadata are key, and we should keep careful track of them for each project. I now put a Metadata subdirectory into each Data file in my KRMPapers folder. The subdirectory contains separate files for each dataset used in the project, and I try to keep the files in compliance with the Ecological Metadata Language (EML, Fegraus et al. 2005 Bull. ESA).

Process metadata:

Set of processing procedures used to generate the analyzed dataset from the “read-in” dataset.

Conforming to existing ontologies

See Madin et al. 2008 TREE

Data wrangling

File organization structure

I recommend identifying a consistent structure for all your projects, and using it consistently. I recommend the following main directories:


All of my research work sits within the Research directory. Key subdirectories here are KRMPapers (files for each publication I’ve led or co-authored), Grants (files for each grant I’ve submitted, with year), Conferences (files for each conference talk).


In KRMPapers, subdirectories correspond to each manuscript. EVERY MANUSCRIPT SUBDIRECTORY has the same four files: Code, Data, Figures, Drafts. The Rproject for each manuscript sits at a level parallel to these files.


Two key subdirectories in my Notes folder are:


A set of text files for programming tasks I have to do mutliple times, but not frequently enough to recall how to do them from one time to the next. I organize these by language (so, I’ve got a Perl subdirectory, a Bash subdirectory, a Python subdirectory, etc.). Each of those subdirectories contains skill-specific text files of notes and examples (in Bash, I have a ForLoop file, since I never remember the precise syntax from one use to another).


This folder contains every science paper I’ve ever read. It’s linked to Mendeley.  For me, it was imperative to develop a consistent way of naming pdfs.  Here’s what I use:

<first author’s last name>_<year>_<venue abbreviation>_<summary of title>.pdf

So for example, my pdf of Scott Creel’s 1995 Animal Behaviour paper, entitled “Elk alter habitat selection as an antipredator response to wolves” is in a file named


This naming structure makes it easy for me to locate pdfs and pass them on to my collaborators.


I split this one out into CoursesTaken and CoursesTaught. If you are headed for a research career, keep the notes from courses you take very well-organized — you may want to draw from them to teach in the future (this has been true for me on multiple occasions).

  1. Style
    1. Find a style you prefer, and stick to it diligently.
  2. Annotation
    1. Commenting
    2. Integrating code with documentation through Markdown, knitr, Jupytr, etc.
  3. IDEs
  4. Metadata

Leave Comment

Your email address will not be published. Required fields are marked *