📘 Unit 1.4: Best Practices and Package Management


🎯 Learning Objectives

By the end of this unit, the student will be able to:

  • Apply style and naming conventions to write readable and professional R code.
  • Install, load, and manage packages from CRAN, Bioconductor, and GitHub.
  • Use modern tools to enhance productivity and reproducibility (here, fs, styler, lintr).
  • Understand the importance of the tidyverse as a coherent ecosystem for data analysis.
  • Avoid common errors when managing paths, dependencies, and work environments.

📚 1. Code Style and Readability

1.1. Why Style Matters

Code is not just for machines — it’s for humans too. Well-written code is:

  • Easy to read and understand (by you and others).
  • Easy to maintain and modify.
  • Less prone to errors.
  • Professional and ready for collaboration or production.

1.2. Naming Conventions

R does not enforce a style, but widely adopted conventions exist:

Type Recommended Style Example Usage
Objects, variables, functions snake_case customer_data, calculate_mean Standard in tidyverse
S3/S4 Classes, constructors PascalCase DataFrame, LinearModel Base and advanced packages
Constants ALL_CAPS PI = 3.1416 Optional, rarely used in R
Temporary variables . or short names .x, tmp Only in functions or pipes

Recommendation: Use snake_case for everything, unless you are developing a package with classes.

1.3. Code Formatting and Structure

Indentation and spacing

  • Use 2 spaces (not tabs) for indentation.
  • Put spaces around operators: x <- 5 + 3, not x<-5+3.
  • After commas: c(1, 2, 3), not c(1,2,3).

Line length

  • Maximum 80 characters per line (ideal for terminal and diff readability).
  • If a line is too long, use %>% or break with + in ggplot2.
# ❌ Long line
result <- filter(mtcars, cyl == 4 & mpg > 25 & wt < 2.5)

# ✅ Better with pipe
result <- mtcars %>%
  filter(cyl == 4, mpg > 25, wt < 2.5)

1.4. Automated Formatting Tools

styler

Automatically formats your code according to tidyverse style guides.

# Install
install.packages("styler")

# Use on a script
styler::style_file("my_script.R")

# Use in RStudio: Ctrl + Shift + A (Windows/Linux) or Cmd + Shift + A (Mac)

lintr

Checks for style errors and best practices in real time.

# Install
install.packages("lintr")

# Check a file
lintr::lint("my_script.R")

# RStudio integration: shown in the "Markers" panel

📦 2. Package Management

2.1. What are R Packages?

Packages are collections of functions, data, documentation, and compiled code that extend R’s capabilities. Base R includes ~30 packages; CRAN has 19,000+.

2.2. Installing Packages

From CRAN (main repository)

install.packages("dplyr")
install.packages(c("ggplot2", "readr", "lubridate")) # multiple

From Bioconductor (biology, genomics)

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")

From GitHub (development version)

# Install remotes if you don't have it
install.packages("remotes")

# Install from GitHub
remotes::install_github("tidyverse/ggplot2")
remotes::install_github("rstudio/leaflet")

2.3. Loading Packages

library(dplyr)     # loads and attaches package to search path
require(ggplot2)   # similar, but returns TRUE/FALSE (useful in functions)

# Load without attaching (avoids naming conflicts)
dplyr::filter(mtcars, cyl == 4)

⚠️ Caution: Some packages have functions with the same name (e.g., filter in dplyr and stats). Use :: to specify.

2.4. Updating and Removing Packages

# Update all packages
update.packages(ask = FALSE)

# Update a specific package
install.packages("dplyr", dependencies = TRUE)

# View outdated packages
old.packages()

# Remove a package
remove.packages("package_name")

2.5. Dependency and Environment Management

renv — Reproducible Environments

Ideal for projects that must be reproducible on other machines.

# Initialize renv in a project
renv::init()

# Install packages (saved locally in project)
renv::install("dplyr")

# Freeze current state
renv::snapshot()

# Restore environment on another machine
renv::restore()

Creates a renv.lock file with exact versions of all packages.


🧭 3. File Paths and File Handling

3.1. The Problem with Relative Paths

Using setwd() and relative paths ("../data/data.csv") is fragile and non-reproducible.

3.2. Solution: here::here()

The here package automatically detects the project root (where the .Rproj file is) and builds paths from there.

# Install
install.packages("here")

# Use
library(here)

data_path <- here("data", "customers.csv")
data <- read.csv(data_path)

✅ Works the same on Windows, Mac, or Linux.
✅ No need to change working directory.
✅ Ideal for sharing projects.

3.3. Modern Alternative: fs

For advanced file and folder manipulation.

library(fs)

# Create directory
dir_create("output")

# List files
dir_ls("data/")

# Check if exists
file_exists(here("data", "customers.csv"))

🔄 4. Modern Alternatives to Base Functions

Many base R functions are slow, inconsistent, or have unexpected behaviors. The modern ecosystem (tidyverse) offers superior alternatives.

Base Function Modern Alternative Advantages
read.csv() readr::read_csv() Faster, doesn’t convert strings to factors, explicit types
data.frame() tibble::tibble() Doesn’t print 1000 rows by default, preserves types, clearer
factor() forcats::as_factor() Better level handling, integrated with tidyverse
strsplit() stringr::str_split() Consistent, always returns list or vector, more readable
Sys.time() lubridate::now() More readable, easy date manipulation

Example:

# ❌ Base R
data <- read.csv("data.csv", stringsAsFactors = FALSE)
data$date <- as.Date(data$date, "%Y-%m-%d")

# ✅ Modern
library(readr)
library(lubridate)

data <- read_csv(here("data", "data.csv")) %>%
  mutate(date = ymd(date))

🌐 5. Introduction to the Tidyverse

5.1. What is the Tidyverse?

A collection of packages designed to work together, with consistent philosophy and grammar, primarily created by Hadley Wickham and team at Posit (formerly RStudio).

5.2. Core Packages

Package Purpose
ggplot2 Data visualization
dplyr Data frame manipulation
tidyr Data cleaning and reshaping
readr Reading flat files
purrr Functional programming
tibble Modern data frames
stringr String manipulation
forcats Factor manipulation
lubridate Date manipulation

5.3. Installation and Loading

# Install entire tidyverse
install.packages("tidyverse")

# Load (loads core packages)
library(tidyverse)

📌 Note: Loading tidyverse does not load all its packages — only the main ones. For stringr, lubridate, etc., you may sometimes need to load them explicitly if not in the search path.


🛑 6. Common Errors and How to Avoid Them

6.1. Forgetting to load a package

# Error: function not found
filter(mtcars, cyl == 4)

# Solution
library(dplyr)
filter(mtcars, cyl == 4)

6.2. Name conflicts

library(dplyr)
library(stats)

# Which filter is used?
filter(mtcars, cyl == 4) # Uses dplyr::filter (last loaded)

# Solution: be explicit
dplyr::filter(mtcars, cyl == 4)

6.3. Broken paths when sharing projects

# ❌ Fragile
setwd("C:/Users/Juan/Project/data")
data <- read.csv("customers.csv")

# ✅ Robust
library(here)
data <- read_csv(here("data", "customers.csv"))

6.4. Not updating packages

Old versions may have bugs or incompatibilities. Use:

update.packages(ask = FALSE)

Or better, use renv to freeze versions in critical projects.


📝 7. Best Practices Checklist

Before delivering or sharing your code, verify:

✅ You use snake_case for variable and function names.
✅ Your lines do not exceed 80 characters.
✅ You use here::here() for paths.
✅ You load packages with library() at the top of the script.
✅ You use modern functions (read_csv, tibble, etc.).
✅ Your code is formatted with styler.
✅ You’ve checked for errors with lintr.
✅ You’ve documented complex parts with comments.
✅ Your project has a clear folder structure: /data, /scripts, /output, /docs.
✅ You use renv if the project must be reproducible in another environment.


🧪 Practical Exercise: “Code Organizer” Project

  1. Create a new project in RStudio.
  2. Structure folders: data/, scripts/, output/, docs/.
  3. Download a CSV dataset (e.g., save mtcars as data/cars.csv).
  4. Create a script in scripts/cleaning.R that:
    • Loads necessary packages (tidyverse, here).
    • Reads the file using here().
    • Performs a simple transformation (e.g., filter cars with more than 100 hp).
    • Saves the result to output/filtered_cars.csv.
  5. Use styler to format the script.
  6. Run lintr::lint() and fix warnings.
  7. (Optional) Initialize renv and freeze the environment.

📚 Additional Resources


✅ With this unit, you’ve laid the foundation for a professional, reproducible, and scalable workflow in R. You’re now ready to dive into the powerful world of the tidyverse in the next module.

Course Info

Course: R-zero-to-hero

Language: EN

Lesson: Module04