old_column |
---|
Here |
is |
a |
column |
I haven’t posted for a while, and came across a tweet from Angie Jones that I really related to.
Not that my previous posts were intellectual thinkpieces, but I thought that I had to write about something novel or innovative to provide any level of value.
When I first starting using R
, my code was a mash-up of base R
, dplyr
, and data.table
. I would reference a column by index and then by name. It was hard for me to follow, and I cringe at the idea that I sent some of this old code to colleagues.
I was trying to think of how many ways there are to do simple data cleaning tasks in R
, and thought it would be fun to explore.
The only task accomplished in the rest of this post will be renaming a column, and some pics of my cats.
- Original column name:
old_column
- Renamed column name:
new_column
Every example will include a data.frame
that is called df
and will contain one column named old_column
that we will rename as new_column
:
Using Base R
The following examples will only use base R
, meaning no additional packages will be required to run this code.
- Call
colnames
ondf
and index the first column.
colnames(df)[1] <- "new_column"
- Call
names
ondf
and index the first column.
names(df)[1] <- "new_column"
- Call
colnames
ondf
and subset the first column also usingcolnames
.
colnames(df)[colnames(df) == "old_column"] <- "new_column"
- Call
names
ondf
and subset the first column also usingnames
.
names(df)[names(df) == "old_column"] <- "new_column"
- Call
colnames
ondf
and subset the first column usingnames
.
colnames(df)[names(df) == "old_column"] <- "new_column"
- Call
names
ondf
and subset the first column usingcolnames
.
names(df)[colnames(df) == "old_column"] <- 'new_column'
- Call
colnames
ondf
and subset using logical indexingwhich
. This returns the index of the column that is equal to “old_column”.
colnames(df)[which(colnames(df) == "old_column")] <- "new_column"
- Since
df
only has one column, we can also callnames
ondf
:
names(df) <- "new_column"
- …or
colnames
ondf
:
colnames(df) <- "new_column"
- We can also use a different, and less efficient approach. Instead of renaming the column value, we can create a new column that is identical to
old_column
and name itnew_column
. Then we can removeold_column
from ourdf
:
# Create a new column called "new_column" that is an exact copy of "old_column"
$new_column <- df$old_column
df
# Remove "old_column"
$old_column <- NULL df
- Getting a bit more abstract, we can use
colnames
withgrepl
to useregex
pattern matching:
colnames(df)[grepl("old", colnames(df))] <- "new_column"
- …we can also use
names
with #11:
names(df)[grepl("old", names(df))] <- "new_column"
- We can swap the first
names
withcolnames
:
colnames(df)[grepl("old", names(df))] <- "new_column"
- Flip it and reverse it…
names(df)[grepl("old", colnames(df))] <- "new_column"
- Using
grep
+names
:
names(df)[grep("old", names(df))] <- "new_column"
- Using
grep
+colnames
:
colnames(df)[grep("old", colnames(df))] <- "new_column"
- Using
grep
+names
thencolnames
:
names(df)[grep("old", colnames(df))] <- "new_column"
- Using
grep
+colnames
thennames
:
- (I am intentionally stopping myself from more Missy Elliott references.)
colnames(df)[grep("old", names(df))] <- "new_column"
- Using
sub
+colnames
:
colnames(df) <- sub("old_column", "new_column", colnames(df))
- Using
sub
+names
:
names(df) <- sub("old_column", "new_column", names(df))
- Using
sub
+names
thencolnames
:
names(df) <- sub("old_column", "new_column", colnames(df))
- Using
sub
+colnames
thennames
:
colnames(df) <- sub("old_column", "new_column", names(df))
- Using
gsub
+colnames
:
colnames(df) <- gsub("old_column", "new_column", colnames(df))
- Using
gsub
+names
:
names(df) <- gsub("old_column", "new_column", names(df))
- Using
gsub
+names
thencolnames
:
names(df) <- gsub("old_column", "new_column", colnames(df))
- Using
gsub
+colnames
thennames
:
colnames(df) <- gsub("old_column", "new_column", names(df))
- Using a
for loop
withcolnames
:
for (i in paste0("new_column")){
colnames(df) <- i
}
- Using a
for loop
withnames
:
for (i in paste0("new_column")){
names(df) <- i
}
- Using
setNames
:
<- setNames(df, "new_column") df
- Using
eval
andparse
withnames
:
eval(parse(text = 'names(df) <- "new_column"'))
- Using
eval
andparse
withcolnames
:
eval(parse(text = 'colnames(df) <- "new_column"'))
- Using
setNames
andreplace
:
setNames(df, replace(names(df), names(df) == 'old_column', 'new_column'))
- Using
transform
:
<- transform(df, new_column = old_column, old_column = NULL) df
tidyverse
You can learn more about the tidyverse
here
- Using
rename
without a%>%
:
<- rename(df, "new_column" = "old_column") df
- Using
rename
with a%>%
:
<- df %>%
df rename("new_column" = "old_column")
- Renaming in a
select
call without a%>%
:
<- select(df, "new_column" = "old_column") df
- Renaming in a
select
call with a%>%
:
<- df %>%
df select("new_column" = "old_column")
- Using
mutate
to create a new column and then removing theold_column
:
<- df %>%
df mutate(new_column = old_column) %>%
select(-old_column)
- Using
mutate
to create a new column and then removing theold_column
without pipes (%>%
):
<- mutate(df, new_column = old_column)
df $old_column <- NULL df
- Using
purrr
+setnames
andstr_replace_*
:
<- df %>%
df set_names(~(.) %>%
str_replace_all("old_column", "new_column"))
- Using a character vector and
rename
:
<- c("new_column" = "old_column")
rename_vec
<- df %>%
df rename(rename_vec)
- Using
str_replace
+names
:
names(df) <- str_replace(names(df), "old_column", "new_column")
- Using
str_replace
+colnames
:
colnames(df) <- str_replace(colnames(df), "old_column", "new_column")
- Using
starts_with
:
<- df %>%
df select("new_column" = starts_with("old"))
- Using
ends_with
:
<- df %>%
df select("new_column" = ends_with("column"))
- Using
rename_with
+gsub
:
<- df %>%
df rename_with(~gsub("old_", "new_", .x))
- Using
rename_with
+sub
:
<- df %>%
df rename_with(~sub("old_", "new_", .x))
- Using
rename_with
andstr_replace
:
<- df %>%
df rename_with(~str_replace("new_column", "old_column", .x))
Rename
with an index:
<- df %>%
df rename("new_column" = 1)
A note: I’m going to stop interchanging names
and colnames
as I did previously. I didn’t have any idea how many ways there would be to rename columns when I started this, but it’s becoming evident that there are likely hundreds of ways if we count every nuance.
I’m also throwing in the towel on the deprecated/superseded rename_at
/ rename_if
/ rename_all
functions, since they have been replaced by select
and rename_with
.
data.table
data.table
is really fast, and you can… do cool stuff with it. I am a data.table
n00b. You can learn more about data.table
here.
- Using
data.table::setnames
:
<- as.data.table(df, keep.rownames = FALSE)
df setnames(df, "old_column", "new_column")
- Using
data.table::setnames
with an index:
<- as.data.table(df, keep.rownames = FALSE)
df setnames(df, 1, "new_column")
- Refactoring the previous
data.table
example (I have no idea what I’m doing 😅)
as.data.table(df)[, .(new_column = old_column)]
What’s in a (re)name?
R
is an amazing language and there are endless things you can do. Coming from SPSS, I was previously familiar with rename
and just left it at that. I had some grand ideas of microbenchmarking each of these methods to find the fastest renaming solution, and maybe that will happen someday if I get an espresso machine or something. ☕
Our team at work will be transitioning from SPSS to R, and this has given me a lot to think about, specifically about the importance of having standardized code, but also having some built-in flexibility for each person’s coding style. I’m looking forward to another version of this post, where I focus on a task that is slightly more complicated. Maybe iterating through a data.frame
column/rowwise?
I also acknowledge my severe lack of data.table
knowledge. I don’t work with big data, and am not in a position to need to make production-level code performant. tidyverse
code is way more intuitive for me, and the community is really supportive and engaged, so I will likely leave data.table
off the …table for a while.
… I’ll see myself out.
Cats
References
- https://stackoverflow.com/questions/7531868/how-to-rename-a-single-column-in-a-data-frame
- https://stackoverflow.com/questions/35084427/how-to-change-column-names-in-dataframe-in-the-loop
- https://stackoverflow.com/questions/50687741/how-to-rename-column-headers-in-r
- https://stackoverflow.com/questions/46616591/rename-multiple-dataframe-columns-using-purrr
- https://stackoverflow.com/questions/20987295/rename-multiple-columns-by-names
- https://stackoverflow.com/questions/9283171/rename-multiple-dataframe-columns-referenced-by-current-names/9292258
- https://stackoverflow.com/questions/53168572/how-to-rename-specific-variable-of-a-data-frame-with-setnames