Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bitflags for oxc_regular_expression::ast::Flags #5562

Open
rzvxa opened this issue Sep 6, 2024 · 4 comments · May be fixed by #5592
Open

Use bitflags for oxc_regular_expression::ast::Flags #5562

rzvxa opened this issue Sep 6, 2024 · 4 comments · May be fixed by #5592
Assignees
Labels
C-enhancement Category - New feature or request C-performance Category - Solution not expected to change functional behavior, only performance good first issue Experience Level - Good for newcomers

Comments

@rzvxa
Copy link
Collaborator

rzvxa commented Sep 6, 2024

Currently our oxc_ast::ast::literal::RegExpFlags is defined using the bitflags macro.

bitflags! {
/// Regular expression flags.
///
/// <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions#advanced_searching_with_flags>
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct RegExpFlags: u8 {
/// Global flag
///
/// Causes the pattern to match multiple times.
const G = 1 << 0;
/// Ignore case flag
///
/// Causes the pattern to ignore case.
const I = 1 << 1;
/// Multiline flag
///
/// Causes `^` and `$` to match the start/end of each line.
const M = 1 << 2;
/// DotAll flag
///
/// Causes `.` to also match newlines.
const S = 1 << 3;
/// Unicode flag
///
/// Causes the pattern to treat the input as a sequence of Unicode code points.
const U = 1 << 4;
/// Sticky flag
///
/// Perform a "sticky" search that matches starting at the current position in the target string.
const Y = 1 << 5;
/// Indices flag
///
/// Causes the regular expression to generate indices for substring matches.
const D = 1 << 6;
/// Unicode sets flag
///
/// Similar to the `u` flag, but also enables the `\\p{}` and `\\P{}` syntax.
/// Added by the [`v` flag proposal](https://github.com/tc39/proposal-regexp-set-notation).
const V = 1 << 7;
}
}

While the oxc_regular_expression::ast::Flags is defined as such:

#[ast]
#[derive(Debug, Clone)]
#[generate_derive(CloneIn, ContentEq, ContentHash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
pub struct Flags {
pub span: Span,
pub global: bool,
pub ignore_case: bool,
pub multiline: bool,
pub unicode: bool,
pub sticky: bool,
pub dot_all: bool,
pub has_indices: bool,
pub unicode_sets: bool,
}

I suggest we refactor Flags to use bitflags and maybe rename it to RegularExpressionFlags. Ideally, we should remove the RegExpFlags and share this flag and its parser(probably through RegularExpressionFlags::try_from("giv")) throughout the project.

We can still keep a FlagsOption that implements Into<RegularExpressionFlags> if we want to allow users to use a struct to pass in flags to the pattern parser.

cc @leaysgur @overlookmotel

@rzvxa rzvxa added C-enhancement Category - New feature or request C-performance Category - Solution not expected to change functional behavior, only performance labels Sep 6, 2024
@rzvxa
Copy link
Collaborator Author

rzvxa commented Sep 6, 2024

In regards to the span field on the Flags if we find it absolutely necessary to have this field we can extract the flags to a bitflag and define it as such:

 #[ast] 
 #[derive(Debug, Clone)] 
 #[generate_derive(CloneIn, ContentEq, ContentHash)] 
 #[cfg_attr(feature = "serialize", derive(Serialize, Tsify))] 
 pub struct Flags { 
     pub span: Span, 
     pub flags: RegularExpressionFlags
 } 

However, I think this span is redundant since we can easily obtain it at the RegularExpression level which is the only place we really would access this flag normally.

@Boshen Boshen added the good first issue Experience Level - Good for newcomers label Sep 7, 2024
@leaysgur
Copy link
Collaborator

leaysgur commented Sep 7, 2024

I suggest we refactor Flags to use bitflags and maybe rename it to RegularExpressionFlags.

I agree with this.

In regards to the span field on the Flags
However, I think this span is redundant

From the perspective of oxc_parser, I think this is true.

But from the perspective of the independent oxc_regular_expression crate, it may be necessary...?

@rzvxa
Copy link
Collaborator Author

rzvxa commented Sep 7, 2024

But from the perspective of the independent oxc_regular_expression crate, it may be necessary...?

In what context do we need this information? Are we looking also to produce an oxc_regular_expression::ast::RegularExpression for regex constructor calls?

How I look at it is that a flag always contains valid values and depending on the number of flags that are true we can calculate its span from a starting point, So in the RegularExpression type we can calculate it like this:

impl<'a> RegularExpression<'a> {
    pub fn flags_span(&self) -> Span {
        Span::new(self.pattern.span.end + 1 /* skip `/` in the middle */, self.span.end)
    }
}

And we can also have this method on the RegularExpressionFlags:

impl RegularExpressionFlags {
    pub fn active_flags(&self) -> u8 {
        return number_of_active_flags
    }
}

Where do users care for flags without also having access to their regex literal?

@leaysgur
Copy link
Collaborator

leaysgur commented Sep 7, 2024

Are we looking also to produce an oxc_regular_expression::ast::RegularExpression for regex constructor calls?

Yes, I was thinking in case of that.
However, the position of the flag text also should be known, and it seems OK for now. 😅 (sorry)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category - New feature or request C-performance Category - Solution not expected to change functional behavior, only performance good first issue Experience Level - Good for newcomers
Projects
None yet
3 participants