library("tidyverse")

dplyr Information

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette("dplyr").

Additional Resources


Open Powerlifting Data

powerlifting = read_csv("http://daviddalpiaz.com/files/data/openpowerlifting-2019-12-04.csv.gz")

skimr::skim(powerlifting)
Data summary
Name powerlifting
Number of rows 1685579
Number of columns 38
_______________________
Column type frequency:
character 14
Date 1
logical 1
numeric 22
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Name 0 1.00 2 45 0 476575 0
Sex 0 1.00 1 2 0 3 0
Event 0 1.00 1 3 0 7 0
Equipment 0 1.00 3 10 0 5 0
AgeClass 721252 0.57 4 6 0 16 0
BirthYearClass 839149 0.50 5 6 0 7 0
Division 3964 1.00 1 37 0 3583 0
WeightClassKg 18140 0.99 1 6 0 252 0
Place 0 1.00 1 3 0 125 0
Country 1188472 0.29 2 24 0 177 0
Federation 0 1.00 2 14 0 263 0
MeetCountry 0 1.00 2 22 0 102 0
MeetState 576839 0.66 2 3 0 116 0
MeetName 0 1.00 1 155 0 14166 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
Date 15 1 1964-09-05 2019-12-01 2015-03-03 5984

Variable type: logical

skim_variable n_missing complete_rate mean count
Tested 1685579 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Age 758190 0.55 31.61 13.28 0.00 21.50 28.00 40.00 98.00 ▃▇▃▁▁
BodyweightKg 22790 0.99 84.39 22.95 15.10 67.13 82.10 99.25 484.00 ▇▂▁▁▁
Squat1Kg 1274583 0.24 113.44 145.42 -555.00 87.50 145.00 200.00 555.00 ▁▂▃▇▁
Squat2Kg 1279540 0.24 92.83 171.23 -580.00 70.00 145.00 205.00 577.50 ▁▂▂▇▁
Squat3Kg 1290977 0.23 31.77 198.23 -600.50 -165.00 110.00 192.50 560.00 ▁▅▂▇▁
Squat4Kg 1681089 0.00 68.57 192.47 -550.00 -110.00 130.50 200.00 505.50 ▁▂▂▇▁
Best3SquatKg 496596 0.71 175.11 69.14 -477.50 122.50 170.00 219.99 577.50 ▁▁▆▇▁
Bench1Kg 1063749 0.37 84.01 104.45 -502.50 57.50 105.00 145.00 467.50 ▁▁▃▇▁
Bench2Kg 1071693 0.36 55.28 129.65 -507.50 -50.00 95.00 145.00 487.50 ▁▂▅▇▁
Bench3Kg 1090473 0.35 -18.47 143.67 -575.00 -140.00 -60.00 117.50 478.54 ▁▃▇▇▁
Bench4Kg 1674033 0.01 18.99 166.82 -500.00 -130.50 72.50 155.00 487.61 ▁▆▅▇▁
Best3BenchKg 185537 0.89 118.09 54.54 -522.50 75.00 115.00 151.95 488.50 ▁▁▃▇▁
Deadlift1Kg 1232229 0.27 161.59 108.85 -461.00 125.00 180.00 225.00 450.00 ▁▁▁▇▁
Deadlift2Kg 1241588 0.26 130.77 161.02 -470.00 115.00 177.50 230.00 460.40 ▁▂▁▇▁
Deadlift3Kg 1261457 0.25 15.10 213.96 -587.50 -210.00 117.50 205.00 457.50 ▁▆▂▇▂
Deadlift4Kg 1674342 0.01 72.83 194.22 -461.00 -125.00 142.88 207.50 418.00 ▁▃▁▇▂
Best3DeadliftKg 423269 0.75 189.30 62.23 -410.00 140.60 188.24 235.00 585.00 ▁▁▇▇▁
TotalKg 124759 0.93 392.87 203.18 2.50 225.00 375.00 542.50 1407.50 ▆▇▃▁▁
Wilks 139048 0.92 284.85 125.07 1.47 177.57 304.15 374.73 793.33 ▅▆▇▁▁
McCulloch 139285 0.92 292.74 126.87 1.47 185.30 311.23 383.75 844.83 ▅▇▇▁▁
Glossbrenner 139048 0.92 268.48 119.14 1.41 166.14 284.58 355.03 756.90 ▅▇▇▁▁
IPFPoints 176350 0.90 489.27 113.31 0.86 407.73 483.00 563.03 1245.93 ▁▇▆▁▁

Manipulating Rows with filter()

powerlifting %>% 
  filter(Federation == "IPF", Equipment == "Raw")

Manipulating Columns with select()

powerlifting %>% 
  filter(Federation == "IPF", Equipment == "Raw") %>% 
  select(-Squat4Kg, -Bench4Kg, -Deadlift4Kg)
powerlifting %>%
  filter(Federation == "IPF", Equipment == "Raw") %>% 
  select(-contains("4Kg"))

Creating Columns with mutate()

powerlifting %>%
  filter(Federation == "IPF", Equipment == "Raw") %>% 
  mutate(total_bw_ratio = TotalKg / BodyweightKg)

Summarizing Data with summarise()

powerlifting %>%
  filter(Federation == "IPF", Equipment == "Raw", 
         Division == "Open", Sex == "M") %>%
  group_by(AgeClass) %>%
  summarise(ave_squat = mean(Best3SquatKg, na.rm = TRUE),
            ave_bench = mean(Best3BenchKg, na.rm = TRUE),
            ave_dead  = mean(Best3DeadliftKg, na.rm = TRUE))

Reordering Data with arrange()

powerlifting %>% 
  filter(Equipment == "Raw") %>% 
  arrange(desc(Wilks))

Tasks