--- title: "Introduction to `ttt`: Formatted Tables the Easy Way" author: "Benjamin Rich" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: css: vignette.css toc: true vignette: > %\VignetteIndexEntry{Introduction to `ttt`: Formatted Tables the Easy Way} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{table1} %\VignetteEncoding{UTF-8} --- ```{r echo=FALSE, message=FALSE, warning=FALSE, results='hide'} set.seed(123) library(ttt, quietly=TRUE) ``` ## Introduction `ttt` stands for "The Table Tool" (or, if you prefer "Tables! Tables! Tables!"). It allows you to creates formatted HTML tables of in a flexible and convenient way. Creating nice formatted tables has traditionally been a pain points with R. Over the years, however, things have gotten a lot better, with the emergence of packages that can produce nice looking tables in HTML (or LaTex or even Microsoft® Word), or which there are now a large number.^[ A non-exhaustive list includes: [flextable](https://CRAN.R-project.org/package=flextable), [kableExtra](https://CRAN.R-project.org/package=kableExtra), [huxtable](https://CRAN.R-project.org/package=huxtable), [htmlTable](https://CRAN.R-project.org/package=htmlTable), [tableHTML](https://CRAN.R-project.org/package=tableHTML), [ztable](https://CRAN.R-project.org/package=ztable), [formattable](https://CRAN.R-project.org/package=formattable), [pixiedust](https://CRAN.R-project.org/package=pixiedust), [basictabler](https://CRAN.R-project.org/package=basictabler), [mmtable2](https://github.com/ianmoran11/mmtable2), [gt](https://github.com/rstudio/gt), [DT](https://CRAN.R-project.org/package=DT), [tables](https://CRAN.R-project.org/package=tables), [xtable](https://CRAN.R-project.org/package=xtable). ] But most of those packages treat tables just like `data.frames`, i.e. a grid of rows and columns, with very little structure. While some packages do have constructs that allow you to group columns or rows with additional headers and labels, that structure is basically superficial, tacked onto the `data.frame` after the fact. And while it may achieve the desired result it tends to require more code and by consequence, effort. Another thing that some of these packages do is they give you the flexibility to control the visual appearance or styling of the tables (fonts, colors, grid lines, spacing, etc.) directly from the R code. This is nice, but typically achieving that level of flexibility requires a pretty complex interface of functions and arguments dedicated to styling, and achieving the desired result can take a considerable amount of code and hence again, effort. This package takes a different approach. It focuses on the table structure and content, leaving the formatting duties to CSS, a dedicated language that was designed specifically for this purpose (the downside is that it only works for HTML, but we accept this inconvenience). Also, the package follows the philosophy of not trying to solve all problems, but solving some problems well. Design decisions have been made to make some things easy, at the expense of limiting the package's generality (while still keeping in it some sense quite general, as this vignette will demonstrate). That it is not possible with this package to produce all conceivable tables is a given; that was never the intention. ## Basic Examples Before we start, let's load a couple of packages that we will be using: ```{r} library(table1, quietly=TRUE) library(magrittr, quietly=TRUE) ``` It is worth taking a minute to comment on these packages. The first, `table1`, is like a "sister" package of `ttt` (they are both written by the same author). While not a strict requirement, `table1` contains some utility functions that can also be quite useful in conjunction with `ttt`, so most of the time it is a good idea to load `table1` along with `ttt`. The `magrittr` package contains the well-known "pipe" operator that we will make use of at some point in this vignette, so we load that, too. With that out of the way, let's start looking at what `ttt` can do, and how to use it. The types of tables that can be produced are many and varied. At its simplest, the ``ttt()`` function can turn a `data.frame` into an HTML table: ```{r} ttt(mtcars) ``` But this is far from the typical use case. More typically, there is some *structure* to the data, in the sense that some columns contain *values*, while others contain *keys* that are used to group values according to some common characteristic. We will refer to these latter columns as *facets*, borrowing a term from the `ggplot2` package. We will assume that the data are in a "tidy" format, by which we mean that all the *values* have been placed in a single column (if this is not the case, there are many functions that will allow you to "reshape" the data accordingly). For the second example, continuing with the `mtcars` data, we would like to tabulate the average `mpg` (miles per gallon) by number of gears (rows) and cylinders (columns). Here is the code: ```{r} ttt(mpg ~ gear | cyl, data=mtcars, lab="Number of Cylinders", render=mean) ``` Let's break this down. The first argument is a `formula` with 3 parts: ` ~ | `. The first part, to the left of the symbol `~`, is the name of the column that contains the values (recall that in the "tidy" format there is only one such column). The second part, between the symbols `~` and `|` contains the *row facets*, one or more variables that define how the values should be split into rows. The third part, to the right of the symbol `|` is the *column facets*, one or more variables that define how the values should be split into columns. Following the `formula` comes the `data` argument, which is the `data.frame` that contains the data that the `formula` refers to. The next argument `lab` is an optional label placed over all the columns. The last argument, `render`, is a `function`. This function is called for each grouping of data defined by unique combinations of the row and column facets, and produces the value that appears in the corresponding cell of the table. Hopefully this is fairly intuitive. Now, the above table is nice, but still not quite what we want. Here are the issues that need to be addressed: 1. The label "gear" should be changed to "Number of Gears". 2. One table cell contains the cryptic value "NaN" because there aren't any cars with 8 cylinders and 4 gears in our dataset; we would like this cell to remain empty instead. 3. The number of decimal digits is different in each cell; we would like it to be the same throughout the table (1 decimal digit). Let us now address these issues. ```{r} label(mtcars$gear) <- "Number of
Gears" rndr <- function(x, ...) { if (length(x) == 0) return("") round_pad(mean(x), 1) } ttt(mpg ~ gear | cyl, data=mtcars, lab="Number of Cylinders", render=rndr) ``` The way we addressed the first issue was to add a label to the variable `gear` using the `label()` function (one of the useful utility functions from the `table1` package). The two other issues were fixed by defining a function `rndr()` to do the rendering. The `ttt()` function allows the order of the `formula` data `data` arguments to be switched, so that an alternative syntax using the `magrittr` "pipe" operator may be used: ```{r} mtcars %>% ttt(mpg ~ gear | cyl, lab="Number of Cylinders", render=rndr) ``` ## Facets The facets allow you to "slide-&-dice" the data however you want. The column facet is optional; it can be omitted: ```{r} ttt(mpg ~ gear, data=mtcars, render=rndr) ``` The row facet is required by the formula syntax, but the "magic" value `1` may be used to indicate no splitting by rows: ```{r} ttt(mpg ~ 1 | cyl, data=mtcars, lab="Number of Cylinders", render=rndr) ``` Both row and column facets may consist of more than one variable, joined together by the symbol `+`. Here is an example with 2 row facets: ```{r} label(mtcars$cyl) <- "Number of
Cylinders" ttt(mpg ~ gear + cyl, data=mtcars, render=rndr) ``` And, similarly for 2 column facets: ```{r} ttt(mpg ~ 1 | gear + cyl, data=mtcars, lab="Number of Cylinders/Gears", render=rndr) ``` The order of the variables is obviously important, as they define a nesting structure. If we think of the `|` that separates the row and column facets as the "values" (i.e. the central part of the table), then the order makes sense: variables closer to the center (`|`) are grouped within variables that are farther away. Just to demonstrate, here is a synthetic example of a large table where both row and column facets are nested 3 levels deep: ```{r} bigtable <- expand.grid( R1=LETTERS[1:3], R2=LETTERS[4:6], R3=LETTERS[7:9], C1=LETTERS[10:12], C2=LETTERS[13:15], C3=LETTERS[16:18]) bigtable$x <- 1:nrow(bigtable) ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable) ``` ## Render functions The `render` function gives a lot of flexibility. For example, instead of the mean `mpg`, we can list the cars according to their gear/cylinder combination: ```{r} ttt(rownames(mtcars) ~ gear | cyl, data=mtcars, lab="Number of Cylinders", render=paste, collapse="
") ``` Note that additional arguments can be passed to the `render` function through `...` (as in this case, `collapse`). Furthermore, a `render` function can return more than one value in a named vector. In this case, the argument `expand.along` also comes into play. It determines how the multiple values are laid out, either vertically (along rows), or the horizontally (along columns). For example, here we define a function that computes both the mean and standard deviation (SD), each to 3 significant digits, and apply it to the response variable in the `OrchardSprays` dataset (i.e., `decrease`) according to treatment: ```{r} rndr.meansd <- function(x) signif_pad(c(Mean=mean(x), SD=sd(x)), digits=3) ttt(decrease ~ treatment, data=OrchardSprays, render=rndr.meansd) ``` The default is `expand.along="rows"`, which produces the result above. As we can see, the table contains an additional column with the values "Mean" and "SD", and each value is displayed in its own row. By default, the extra column is labeled "Statistic", but to change the label to "Blah" we can specify a named vector as follows: ```{r} ttt(decrease ~ treatment, data=OrchardSprays, render=rndr.meansd, expand.along=c(Blah="rows")) ``` The other option is `expand.along="columns"`, which produces this result: ```{r} ttt(decrease ~ treatment, data=OrchardSprays, render=rndr.meansd, expand.along="columns") ``` Now "Mean" and "SD" each have their own column. ## Captions and footnotes A caption and one or more footnotes can be added to the table by specifying string values to the `caption` and `footnote` arguments, respectively: ```{r} ttt(decrease ~ 1 | treatment, data=OrchardSprays, render=rndr.meansd, lab="Treatment", caption="Mean and SD of Decrease by Treatment", footnote=c("Data: OrchardSprays", "Comment: ttt is cool!")) ``` There's not much more to say about this. ## Styling The appearance of the tables produced by `ttt` can be changed in 2 ways: using themes, or custom styling. Themes are easier, but don't give much flexibility. For fine-level control, custom styling must be used. ### Themes The `ttt` package comes with 2 themes at this time: the `default` theme that has been used throughout this vignette so far, and the `booktabs` theme. (More themes may be added later.) Selecting the theme can be done using the `ttt.theme` global option: ```{r, eval=FALSE} options(ttt.theme="booktabs") # Select the "booktabs" theme ``` If we select the `booktabs` theme, our large table looks like this: ```{r, eval=FALSE} ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable) ``` ```{r, echo=FALSE} css <- readLines(system.file(package="ttt", "ttt_booktabs_1.0/ttt_booktabs.css")) css <- gsub(".Rttt ", ".Rttt-booktabs-demo ", css, fixed=TRUE) ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable, topclass="Rttt-booktabs-demo", css=css) ``` Note that a theme can only apply to a whole document; it is not possible to selectively style different tables within the same document differently using different themes, as we appear to have done here (but it can be done with custom styling, which is how it was done). ```{r, eval=FALSE} options(ttt.theme="default") # Change back to the "default" theme ``` ### Custom styling As mentioned in the introduction, changing the table's appearance is accomplished using CSS. In order to make this possible, `ttt` places "hooks" in the table in the form of `class` attributes on various HTML elements. The first thing to know is that the whole table is enclosed in a `
` element. The allows specific formatting to be applied to tables output by `ttt` without interfering with other tables in the same document. The next thing is that all row labels have the class `Rttt-rl`, and all column labels have the class `Rttt-cl`. Furthermore, there are different classes for each *level* or nesting: `Rttt-rl-lvl1` for the first (i.e. innermost) level or row labels, `Rttt-rl-lvl2` for the second level, and so on, and similarly for the column labels with `cl` instead of `rl`. Finally, it is possible to add a `class` attribute or `id` attribute to the whole table, so that it may be targetted with specific CSS selectors. We can also pass CSS code directly to the `ttt()` function to be included with the table. For example, can can specify that our table has the ID `bigtable`, and then give it a particular (and particularly weird) style: ```{r} css <- ' #bigtable { font-family: "Lucida Console", Monaco, monospace; } #bigtable td { background-color: #eee; } #bigtable th { color: blue; background-color: lightblue; } #bigtable th, #bigtable td { border: 2px dashed orange; } #bigtable .Rttt-rl { background-color: #fff; font-style: italic; font-weight: bold; } #bigtable .Rttt-rl-lvl1 { font-size: 12pt; color: pink; background-color: yellow; } #bigtable .Rttt-rl-lvl2 { font-size: 14pt; color: green; } #bigtable .Rttt-rl-lvl3 { font-size: 18pt; color: red; } ' ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable, id="bigtable", css=css) ``` ### Example: conditional formatting A `render` function can actually add a `html.class` attribute to its return value. The value of this attribute will be assigned to the resulting HTML element's `class` attribute, allowing you to target that element with specific formatting. For example, suppose we have a `data.frame` that contains some numeric values, and we want to put them in a table: ```{r} dat <- expand.grid(row=LETTERS[1:5], column=LETTERS[1:5]) dat$value <- rnorm(nrow(dat)) ttt(value ~ row | column, data=dat, render=round_pad, digits=2) ``` Furthermore, suppose we want the cells that contain negative values to be red, and those that contain positive values to be green. (Note: this can actually be easily accomplished using JavaScript, but that's not the point of this example). Let's define a `render` function for this: ```{r} rndr <- function(x, ...) { y <- round_pad(x, 2) attr(y, "html.class") <- ifelse(x < 0, "neg", "pos") y } ``` (since there are no zeros in this example, the code here cheats and treats zero as positive; if you don't like this, think of it as non-negative). The `render` function sets the desired class on the elements. Now, we can add some CSS code to obtain the desired colors: ```{css, echo=TRUE} .neg { color: #990000; background-color: #ff000030; } .pos { color: #007700; background-color: #00ff0030; } ``` Finally, we generate the table: ```{r} ttt(value ~ row | column, data=dat, render=rndr) ```