Touching up plots in Inkscape

When I switched from working primarily with statisticians to working primarily with biologists, one of the hardest lessons I learned was the extent to which biologists value their figures.  This was a long and painful lesson: while R is great for statistical analysis and simulation, I remain unaware of a nice internal R GUI for touching up graphics.  Pair that tragedy with the fact that journal specifications for graphical displays vary far and wide, and you have yourself a lot of time spent fiddling with figures.

I’m aware of two nice and widely used GUIs for manipulating graphics by hand.  One is Adobe Illustrator, which is fabulous but proprietary (and spendy).  The other is Inkscape, which is open-source and freely available.  Becoming comfortable working with Inkscape has saved me a lot of time.

Getting Started

Inkscape is a vector graphics editor, available for OS X, Windows, and Linux download.  It is primarily designed for manipulation and construction of a scalable vector graphic (“svgs“) file, which is an open standard vector graphics format.  I use Inkscape for assembling multipanel figures (like the one shown below), touching up single-panel graphics to comply with journal-specific graphic specifications, and building graphical displays that aren’t readily output by R (like the S-I-R compartment diagram shown in the upper-right hand corner of panel C below).


PhenomMechModel_ForWebsite
Building and writing an SVG file from R

For me, the first step in constructing almost all graphics is designing the figure, and plotting the various pieces that comprise it in R.  I do a fair amount of manipulation in R directly, using the par and plot functions, and the ggplot2 library. When I’ve got a panel constructed so that all the desired datastreams are displayed the way I want, I write the plot out as an svg file. Here’s an example R code snippet.

x <- 1:100
y <- rnorm(100, x^2, 500)

par(oma = c(0, 0, 0, 0))
plot(y ~ x, pch = 16, col = "grey20")
lines(lowess(y ~ x), lty = 2, lwd = 3, col = "grey60")

To write this plot as an svg, I need to add three lines. In one, I specify the path along which I'll write the file (this could be embedded directly in the svg function, however I prefer to keep my paths written outside of functions so they're more readily accessible for manipulation; more on this in future posts). The second thing I do is call R's svg function, which constructs the svg file and writes it to my specified path. Last (and this is important), I have to turn the graphical device off after plot construction is over (that's the dev.off() bit). My revised code file looks like this:

x <- 1:100
y <- rnorm(100, x^2, 500)

write.path <- "~/work/Kezia/Research/BlogPosts/Inkscape1/test.svg"

svg(write.path, width = 6, height = 5)
par(oma = c(0, 0, 0, 0))
plot(y ~ x, pch = 16, col = "grey20")
lines(lowess(y ~ x), lty = 2, lwd = 3, col = "grey60")
dev.off()

To open the figure in Inkscape, navigate to the directory containing your newly-minted svg file, and open the svg in Inkscape. For good measure, I always rename svg files that I've manipulated in Inkscape to _inkscape.svg. There are two reasons for doing this:

  1. It protects me from inadvertently overwriting my Inkscape edits if I rerun the file output commands in R.
  2. I have a good record of exactly what I've manipulated by hand in Inkscape, as opposed to what R generated for me.

I tend to write svgs smaller than I actually want to view them, so the first thing I typically do in Inkscape is resize the figure to dimensions I like. View -> Zoom -> 1:1 is usually a good first step. To resize the artboard (that is, to respecify the dimensions of the whole file, so that your figure occupies the whole file and not just a corner of it), go to File -> Document Properties.  In the resulting dialogue box, adjust the height and width to fit your specifications.

To edit a particular element within the figure, select that element with the pointer tool (you may have to click through a few layers of selections to get to the particular object you want), and then manipulate it as desired. For example, suppose you want to first delete the default bounding box that surrounds the plot in the test.svg object created above, and then extend the x and y axes so that they intersect. To do this, select the bounding box by using the pointer tool and double-clicking at the top of the box. Once the box is selected, hit delete, and Inkscape will remove it. Next, select the y-axis with the pointer tool. When the axis is selected, it will show an arrow at each side and each corner. Selecting one of these arrows and dragging changes the width or length of an object.  In this case, drag the left-pointing axis arrow leftward until it's approximately parallel to the x-axis. Repeat with the x-axis and adjust each axis so that they intersect.

To modify the appearance of an object, select the desired object (for example, the x-axis). Then, under the Object menu, select Fill and Stroke. A dialogue box opens on the right-hand side of the Inkscape window. The tabs in this dialogue box allow for modification of line width (useful for publication-quality figures), color, line type, addition of endpoint markings, etc. Similar dialogues are available for objects of other types.

To construct a multi-paneled figure, output the component plots from R. I vacillate between building the multipaneled figures in R directly using par(mfrow = c(x, y)) and writing them as a single svg file, and exporting each piece separately. The latter is particularly useful when working with default plot functions associated with particular packages that don't allow for easy manipulation of the plot layout using R's par or layout functions.

Once all the necessary svgs are written out of R, open each in its own Inkscape session. Use <Ctrl-a> to select all, and then group all elements of that particular plot together using the "Group selected objects" button midway across the top of the Inkscape window (this is the icon of a blue box overlaid with the blue circle). Copy and paste the grouped plot panel into a new Inkscape session, and bring in all other plots one at a time via the same steps. Grouping all elements of a given plot ahead of time ensures that all plot elements can be resized and moved as a unit.

Inkscape has the capacity to do all sorts of other things, including text manipulation, image creation, etc. It has a well-documented wiki, an active user community, and (best of all) it's free! Hopefully I've provided enough information here to get you started. Enjoy!

LaTex in RStudio

What is LaTeX,  and why bother writing about it?

LaTeX is a typesetting system commonly used by mathematicians.  Simply put, it creates very polished-looking documents that can be converted from manuscript format to presentation format to thesis/dissertation format without all the hassle of reformatting required in GUI-based typesetting systems like Word.  A number of nice R tools, like knitr and sweave, allow for direct integration and dynamic reporting of LaTeX and R code.  These tools make it easy to document what’s happening in your R code. Finally, for better or worse, LaTeX is a powerful signaling tool.

In my mind, there are three additional reasons that biologists might consider using LaTeX:

  1. It has a straight-forward interface for writing out math equations (much less time-consuming that Word’s equation editor).  Though I don’t show examples here, they’re all over the internet (search LaTeX and math).
  2. LaTeX’s beamer document class makes it very easy to reformat a LaTeX paper or batch of notes into slides.
  3. LaTeX interfaces seamlessly with the BibTeX bibliographic referencing tool (an open-source analog to EndNote).

Installing MikTex

LaTeX can be run directly through RStudio, so long as you have MikTeX installed on  your computer as well (here’s a link to the MikTex download page).  In this post, I’ll describe how to build and compile pdf documents using LaTeX wrapped in RStudio.

Specifying a .tex extension in RStudio

LaTeX script files have extension “.tex”.  To open a new .tex file for editing in RStudio, launch RStudio, and then select File -> New -> Text File.  Use File -> Save, and save the file to the desired directory.  Be sure to put the extension “.tex” at the end of the file name (so, call the file <your file name>.tex).  By specifying the .tex extension, you effectively tell RStudio that this is a LaTeX document.  RStudio will give you some new, LaTeX-specific options once it knows this is a .tex file.  For example, you’ll see a “Compile PDF” button at the upper right of the script window once you’ve saved the file with a .tex extension.

Building a .tex file

The minimum content you need to include for LaTeX to build your document are a specification of the document class, indicators of document beginning and ending, and a little content.  Comments in LaTeX are indicated with a leading “%” symbol.

 % trial .tex file %
\documentclass[10pt]{article}  % specifies document class (article) and point size (10pt)

\begin{document}               % starts document

\title{Example Document}       % specifies big, fancy title
\maketitle                     % constructs big, fancy title
\section{Section 1}            % makes a section header
Here is some text in section 1.  Section 1 also contains an itemized list:
  \begin{itemize}              % initiates an itemized list
    \item Here is an item in the list
    \item Here is a second item
  \end{itemize}                % ends itemized list

\section{Section 2}            % makes header for section 2   
Some text on section 2 here.  Section 2 contains an enumerated list.
  \begin{enumerate}            % initiates enumerated list
    \item A first enumerated item
    \item A second enumerated item
  \end{enumerate}              % ends enumerated list

\end{document}                 % ends document

Once you have this script written into your .tex file in RStudio, hit the Compile .pdf button at the upper right of the script window.  Doing so will prompt LaTeX (via RStudio) to compiled the .pdf document you’ve specified.  It will store the .pdf file (along with several auxiliary files LaTeX files) in your working directory. This particular script produces the .pdf file shown below.

TrialFile

Group Writing in LaTeX

My biggest complaint with LaTeX is that I haven’t yet found a nice LaTeX analog to Word’s track-changes functionality.  However, the TeX community is getting closer to a solution.  One site I’ve found useful is sharelatex.com, which allows multiple collaborators to revise a .tex document online, and compile it on the fly (so, you can see the .pdf at the sharelatex site, right next to the .tex file that produced it).  It works great for documents that are primarily text and bibliographies, but I haven’t pushed it too far yet.

To wrap up, LaTeX is an environment that computational biologists should be aware of, especially since RStudio makes it extremely accessible.  It’s not for everyone (and certainly not for every project), but depending on who you’re collaborating with, and what exactly you’ll need to communicate, it may be worth investigating.

Getting Started with RStudio

As a computational biologist with roots in statistics, I do almost all of my work using the statistical computing environment R.  R is an opensource software product that is maintained by a core development team of statisticians and computer scientists and expanded upon by users ranging from scientists to business folks to statisticians.  A strong majority of biological researchers today will interface with R in some manner.

Unfortunately, in spite of all its utility R can be daunting at first.  It isn’t surrounded by a pretty GUI interface; it’s a statistical workhorse, and demands some scripting. In my opinion, RStudio is a great interface that makes R a little less intimidating, and a little more user-friendly.  In this post, I’ll walk through the RStudio layout, and explain how to write a codeline in RStudio, and run that line using R.  Many, many others have done this before; a quick web search on RStudio will find you many more resources.

Layout

RStudio organizes the elements of a conventional R session into a single screen. When you launch RStudio, you’ll see a multipaneled screen like the one in the image below. The upper left panel contains your active scripts that you’re editing.  The lower left is the console, where the “action” takes place (I think of the console as where the R program itself is living).  Having the script editing occur separate from where R lives is helpful, since it lets the user save the script so that its contents can be regained in the future.

My basic protocol for writing code in RStudio is to draft the code in the Script Window, and then “run” that code, at which time the code line is passed to R itself and appears in the lower left console window.  By running a line of the script, the user tells R to carry out the specified action.  The right-hand panels contain help files in the lower portion, and a list of all objects in your R session in the upper right.

RStudioScreenshot1_Inkscape

 

Writing and Running Code

To open a new script, select File -> New -> R Script.  This will open a new script in your script window.  RStudio provides you with some markup (i.e., syntax highlighting to help you see where a parentheses pair begins and ends, where a bracket opens or closes, what lines are commented out, etc.).  Comments in R are denotes with “#”, and RStudio colors those comments green, as seen in the top line of my script file, which reads

#– Example Script –#

In the next (black) line in the script file, I build a new object, which I’ve named “your.first.codeline”.  I use the assignment arrow, “<-“, to stick some stuff into that object.  In this case, the “stuff” is a character vector with two elements, “Hello”, and “World”.  The two elements are stuck together via the “c” function, which stands for concatenate (aka “stick together”).  After typing this line into the script window, if I place my cursor somewhere in the line of code I’ve just written and click the “Run” button at the top of the script window, R will run that script line for me. I could also copy and paste from the script window into the console window to achieve the same result.

When the script line is run, it appears in blue after a “>” in the console window.  R is an object-oriented language, which means that it stores information in objects which can then be displayed, summarized, or operated on.  As soon as I run my code line, R builds the “your.first.codeline” object; however, that object won’t display until I ask to see it.  Note that in my console window, I’ve run the line of code from the script window, and then in the next line (after the second blue “>”), I’ve typed the object’s name (i.e., “your.first.codeline”).  When I type the object’s name, R shows me its contents (in this case, “Hello” and “World”).

 

A biologist organizes her computational tools

Although the “classic” mental image of a biologist likely involves a work bench in a lab, or a spotting scope in the field, the reality is that many of us live our lives attached to our monitors. Certainly this is true of my own work: I study linkages between disease occurrence and lamb survival in wild bighorn sheep, but that work manifests as hours upon hours on my laptop, with only the occasional excursion to the field or lab. What’s more, there’s not much reason to expect that time allocation to change.

Given how much time I spend working on computational tasks, it seemed reasonable to spend some time investigating tools designed streamline the research process. In my quest for a more-perfect computing toolkit, I explored tools intended to facilitate activities ranging from image manipulation to code annotation to scientific collaboration. It didn’t take long to realize that the sheer volume of software available would make separating the wheat from the chaff a truly non-trivial task.

In this blog, I’ll highlight the computational tools and practices most important in facilitating my metamorphosis from undergraduate student to research scientist.  This is the first installment in that story.

Starting points

My undergraduate training was at a liberal arts college, where I pursued a classic liberal arts education (literature, philosophy, religion, French, biology, chemistry, math, and statistics). An enthusiasm for data and biological modeling led to me a masters program in statistics. In my relatively small stats program, the emphasis was on mathematical underpinnings, as opposed to computational manifestations, of statistical models. Following completion of my masters, I went to work as a research assistant with a disease ecology group, and got a two-year crash course in simulation modeling and figure design. Friends and acquaintances in other disciplines gradually convinced me to put more energy into expanding my computational skillset. Their patience and a little stubbornness on my own part combined to start me on an exploration of ways to integrate software into my work such that I maximize its benefits, but minimize distraction.

Moving away from Microsoft

I’ll start with the disclaimer that Microsoft provides a great operating system for the vast majority of computer users. I admire their emphasis on a streamlined, extremely user-friendly integration of document preparation and data management tools.

That said, research scientists working on computationally complex problems probably benefit from a strong working knowledge of another operating system as well. Most alternatives fall into two categories: Mac OSX , or a linux distribution. The appeal of open source led me to linux. I use Ubuntu, which is a good starter distribution. For the first 18 months I used it, I interfaced with it almost exactly as I had with Windows. However, Ubuntu had two advantages right away:

1.  Everything ran a little faster than it had on Windows
Most of my “coding” is in the statistical computing environment R. R has tons of flexibility, and as a scientist, it provides me almost all the tools I need. Unfortunately, it’s not the fastest program ever. When I switched to Ubuntu from Windows, I was in the midst of my first (sloppy) simulation study.  The sims completely bogged down RStudio in Windows, but in Ubuntu, simulation was smoother.

2.  Ubuntu gave me access to all kinds of open source software through its package managers
Ubuntu’s easy access to all sorts of software, much of it open-source, held huge appeal for me.  The work I do is pretty diverse: some spatial problems, some network-based simulation studies, an array of statistical analyses, and (often) graphing.

Utilizing institutional computing resources

I spent a lot of time tying up my own machine running simulations. I assumed that either the resources available from my institution were not really intended for my use, or that accessing them would be too complex and time-consuming to merit the computational benefit. Those assumptions were wrong. As it turns out, I write better code when I’m sending it to the cluster, sending it isn’t difficult, and why was I worried about the resources anyway?  The university has them for people (like me) to use.  In fact, from the terminal I can access a cluster remotely, push my source file and run file over, and get them running on the cluster in about five lines of code.

Writing pretty code that I’m not embarrassed to ask about on Stack Overflow

I know just enough about programming to know that people who do it well are very particular (and perhaps judgemental?) about how it looks. Since I don’t always know exactly how to do every task I need to accomplish, I read Stack Overflow with some frequency, and I’ve seen people get criticized pretty heavily for issues that seem more stylistic than content-based.  There’s an easy fix to this: Google’s style guides.  Now, when my code crashes, at least no one will criticize me for my use of spaces and commenting symbols. It’s important to talk the talk, and in the world of programming, style does matter, at least for getting a foot in the door. Also, in a surprising added benefit, my code is much easier to read now — apparently there was some wisdom in those files after all.

In this blog, I’ll delve into software solutions I find useful. The problems will be old and new, and some of the solutions are older than I am. I am exactly the right person to write this blog, since I’m pretty sure that if I can do <any old computing thing>, anybody else can, too.  The plan is to deal with a spectrum of software, from an overview of a few friendly Linux distributions, to GUI interfaces for manipulating graphics like Inkscape and Gimp, to life-changing text editors like vim.

I’m going into this operating under the assumption that biologists (and particularly girls) get turned off to computing because the community seems unwelcoming, because all the loudest voices in the room are male, and because the vernacular that gets used is totally incomprehensible to the untrained ear. I’m convinced there’s a gentler way to learn the tricks of computing well; I’ll try to find it here.