analyze R package downloads: packagetrackr on CRAN

23.9.2015

A while ago, R-Studio released the download log files of their CRAN mirror. To make it easier to use this data directly for analysis, I wrote a tiny package called packagetrackr to download, cache, parse and filter the log files for you:

library(packagetrackr)
library(dplyr)
library(ggplot2)

(manifestor_downloads <- 
  package_downloads("manifestoR",
                    start = as.Date("2015-05-01"),
                    end = as.Date("2015-09-01"),
                    force = TRUE) %>%
  mutate(date = as.Date(date))) %>%
  ggplot(aes(x = date, fill = version, color = version)) +
    geom_dotplot(binwidth = 1) + 
    coord_fixed(ratio=1) + 
    ylim(0,30)

plot of chunk unnamed-chunk-1

If called the first time, this command will take a long time, since it is actually downloading all log files from RStudio. Next time, it will be a lot faster, since logs are cached locally on your hard disk.

The log files provide more information than only download time and version, e.g. the R version which downloaded the package:

names(manifestor_downloads)
##  [1] "X"         "date"      "time"      "size"      "r_version"
##  [6] "r_arch"    "r_os"      "package"   "version"   "country"  
## [11] "ip_id"
manifestor_downloads %>%
  ggplot(aes(x = r_version, fill = r_version)) + geom_histogram()

plot of chunk unnamed-chunk-2

Note that the package downloads that are logged and hence can be analyzed using packagetrackr are only those at RStudio's CRAN mirror. It is, however, the default mirror set in RStudio IDE.

packagetrackr is on CRAN, so you can simply install it with install.packages("packagetrackr").