Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider case_when() variant returning ordered factors #72

Closed
Torvaney opened this issue Oct 1, 2021 · 3 comments
Closed

Consider case_when() variant returning ordered factors #72

Torvaney opened this issue Oct 1, 2021 · 3 comments

Comments

@Torvaney
Copy link

Torvaney commented Oct 1, 2021

I frequently find myself using the following pattern with case_when:

data %>%
  mutate(
    some_col = case_when(
      cond1 ~ "result1",
      cond2 ~ "result2",
      # etc
    ) %>% fct_relevel("result1", "result2", ...etc...)
  )

In other words, using case_when to create a factor. In most cases, it is trivial to order the case expressions to match the desired factor order (in general, I think it's a little more readable to keep them in factor order anyway).

Since the factor order is already (implicitly) defined in the order of the case expressions, there is some redundancy in this pattern. Consequently, I think it would be useful to provide a version of case_when that returns a factor with the level order defined by the expression order. I think the most natural ways to add this functionality would be with a new function, or by adding an argument to case_when, but I don't have a strong feeling about this.

This would be particularly useful during data exploration, where many different ordered groupings may be tried out with case_when; having to change the factor levels in multiple places quickly becomes tedious and error-prone.

This use-case is common enough to have been brought up independently on stackoverflow: https://stackoverflow.com/questions/49572416/r-convert-to-factor-with-order-of-levels-same-with-case-when

For clarity, here is a simplified demo implementation I have been using:

suppressPackageStartupMessages({
  library(dplyr)
  library(purrr)
  library(rlang)
  library(forcats)
})

factored_case_when <- function(...) {
  args <- list2(...)
  rhs <- map(args, f_rhs)
  
  cases <- case_when(
    !!!args
  )
  
  exec(fct_relevel, cases, !!!rhs)
}


numbers <- c(2, 7, 4, 3, 8, 9, 3, 5, 2, 7, 5, 4, 1, 9, 8)

factored_case_when(
  numbers <= 2 ~ "Very small",
  numbers <= 3 ~ "Small",
  numbers <= 6 ~ "Medium",
  numbers <= 8 ~ "Large",
  TRUE    ~ "Huge!"
)
#>  [1] Very small Large      Medium     Small      Large      Huge!     
#>  [7] Small      Medium     Very small Large      Medium     Medium    
#> [13] Very small Huge!      Large     
#> Levels: Very small Small Medium Large Huge!

Created on 2021-09-24 by the reprex package (v0.3.0)

An alternative forcats-inspired API is also posted here.


This was originally posted as a feature request to dplyr (tidyverse/dplyr#6029) and has been re-posted here (per the request in the linked thread).

@Torvaney Torvaney changed the title Consider case_when returning ordered factors Consider case_when() variant returning ordered factors Oct 1, 2021
@DavisVaughan
Copy link
Member

DavisVaughan commented Dec 23, 2021

This seems to come up quite a bit (at least in my wife's work) when you use a case_when() to bucket data in preparation for creation of a graph or gt table where the order of the levels matters, so I agree that some variant of this would be useful. She basically uses case_when() %>% factor(levels = ) as a more readable version of cut()

@DavisVaughan
Copy link
Member

DavisVaughan commented Aug 17, 2022

Worth mentioning that you can use .default and .ptype in the dev version of dplyr to help with this a little bit

dplyr::case_when(
  letters %in% c("a", "e", "i", "o", "u") ~ "vowel",
  .default = "consonant",
  .ptype = factor(levels = c("vowel", "consonant"), ordered = TRUE)
)
#>  [1] vowel     consonant consonant consonant vowel     consonant consonant
#>  [8] consonant vowel     consonant consonant consonant consonant consonant
#> [15] vowel     consonant consonant consonant consonant consonant vowel    
#> [22] consonant consonant consonant consonant consonant
#> Levels: vowel < consonant

But it would still probably be nice to have a forcats specific helper for this which flipped the order of the inputs, like:

fct_case_when(
  vowel = letters %in% c("a", "e", "i", "o", "u"),
  .default = "consonant",
  .ordered = TRUE
)

See also tidyverse/forcats#298

@DavisVaughan
Copy link
Member

Actually I'm convinced this is just a duplicate of tidyverse/forcats#298 and will eventually live in forcats, so im going to close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants