Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function that creates factor in order of case_when matches #298

Open
dchiu911 opened this issue Feb 1, 2022 · 5 comments
Open

Add function that creates factor in order of case_when matches #298

dchiu911 opened this issue Feb 1, 2022 · 5 comments
Labels
feature a feature request or enhancement

Comments

@dchiu911
Copy link

dchiu911 commented Feb 1, 2022

A common workflow I do is map one vector to another using some (possibly complex) conditions, then coerce to a factor with the level order the same as parsed in dplyr::case_when(). It would be helpful if there was a wrapper that created the factor without having to manually specify the levels.
Currently, I'd do something like this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(2022)
x <- sample(
  c("low", "intermediate", "high"),
  prob = c(0.5, 0.2, 0.3),
  size = 100,
  replace = TRUE
)
z <- rbinom(
  n = 100,
  size = 100,
  prob = 0.3
)
y <- case_when(
  x == "intermediate" | (x == "low" & z < 30) ~ "B",
  x == "low" ~ "A",
  x == "high" ~ "C",
  TRUE ~ NA_character_
) %>%
  factor(levels = c("B", "A", "C"))
str(y)
#>  Factor w/ 3 levels "B","A","C": 1 3 2 3 2 3 2 1 1 3 ...

Created on 2022-02-01 by the reprex package (v2.0.1)

Can we add a function that makes y into a factor with the level order the same as specified in the case_when()? For example,

y <- fct_case(
  x == "intermediate" | (x == "low" & z < 30) ~ "B",
  x == "low" ~ "A",
  x == "high" ~ "C",
  TRUE ~ NA_character_
)
@hadley
Copy link
Member

hadley commented May 20, 2022

I think we'd need to make the syntax more limiting than case_when() because the RHS of a case_when() can itself use data values, and reasoning through how those values should interact between conditions seems hard.

Since we'd want to restrict each expression to a single character level, we could put it in the LHS of =, something like:

something(
  "B" = x == "intermediate" | (x == "low" & z < 30),
  "A" = x == "low",
  "C" = x == "high",
)

But I don't know if any existing tidyverse function uses similar syntax.

@hadley hadley added the feature a feature request or enhancement label May 20, 2022
@dchiu911
Copy link
Author

I do think removing the usage of ~ would make it more consistent as case_when() syntax is quite unique

@DavisVaughan
Copy link
Member

But I don't know if any existing tidyverse function uses similar syntax.

FWIW this is basically how fct_recode() works (name represents new level, value was the old level), so it wouldn't be unheard to let the name represent the new level, and the value be the logical condition

@hadley
Copy link
Member

hadley commented Jan 9, 2023

Will wait until lower level functions are exposed by vctrs.

@hadley hadley removed this from the v1.0.0 milestone Jan 9, 2023
@brianmsm
Copy link

brianmsm commented Jan 5, 2024

I would think it would be convenient to solve this from the case_when() itself:

Something like this:

set.seed(2022)
x <- sample(
  c("low", "intermediate", "high"),
  prob = c(0.5, 0.2, 0.3),
  size = 100,
  replace = TRUE
)
z <- rbinom(
  n = 100,
  size = 100,
  prob = 0.3
)
y <- case_when(
  x == "intermediate" | (x == "low" & z < 30) ~ "B",
  x == "low" ~ "A",
  x == "high" ~ "C",
  TRUE ~ NA_character_,
  .ptype = "factor"
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants