Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: structured diff of conflicted commits #4251

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ilyagr
Copy link
Collaborator

@ilyagr ilyagr commented Aug 11, 2024

This is a proof of concept for #4062. None of the design, the code, or functionality are in a finished state, but I'd like to polish it up to the point where it can be merged for later improvement.

My main goal here is to set up the data model for diffs between conflicts, and to have an example of a UI for showing these diffs to the user. I am not sure whether the latter will be of actual use to users who don't understand conflict theory very well, but the goal is 1) do our best 2) have a debugging tool for exports and 3) most importantly, have a way to present these data structures in tests as we improve them to the point where they can be used in blame, absorb, etc.

The algorithms that actually do all the jobs are inefficient and I already have examples of cases where they do a bad job. Hopefully, they can be improved as a separate project.

Organization

The PR splits into three parts:

  1. Part 1. Everything up to the lib merge: the key explain_diff_of_merges function and data structure commit exists to set up that key commit.

    That commit sets up the DiffOfMergeExplanationAtom<T> type, presented below. The key function has the signature

    fn explain_diff_of_merges(
       left: Merge<T>,
       right: Merge<T>,
       distance: DistanceFunction
    ) -> Vec<DiffOfMergeExplanationAtom<T>>

    where the distance function T⨯T→usize is used to decide which pieces of the conflicts should be paired together into diffs. The distance function we'll use is the size of the diff between two chunks of text.

  2. Part 2. There is a questionable (and less important for now) part that deals with splitting a Merge<Vec<u8>> into hunks.

  3. Part 3. The rest sets up jj diff --conflicts, which is a way the UI to present an object of type Vec<DiffOfMergeExplanationAtom<T>> to the user. The key commit is cli diff-util: implement diff --structured-conflicts, the commits after that try to add a bit of polish.

Unless people object to the general direction of this, I'll start by trying to merge part 1.

Data model: `DiffOfMergeExplanationAtom`
/// A statement about a difference of two conflicts of type `Merge<T>`
///
/// A (conceptually unordered) set of `DiffExplanationAtom<T>` describes a
/// conflict `left: Merge<T>` and a sequence of operations to turn it into a
/// different conflict `right: Merge<T>`. This is designed so that this
/// information can be presented to the user.
///
/// This description is far from unique. We usually pick one by trying to
/// minimize the complexity of the diffs in those operations that involve diffs.
#[derive(PartialEq, Eq, Clone, Debug)]
pub enum DiffExplanationAtom<T> {
    /// Adds one more pair of an add + a remove to the conflict, making it more
    /// complicated (more sides than before)
    AddedConflictDiff {
        /// Add of the added diff.
        ///
        /// This is an "add" of the right conflict that is not an "add" of the
        /// left conflict.
        conflict_add: T,
        /// Remove of the added diff
        ///
        /// This is a "remove" of the right conflict that is not a "remove" of
        /// the left conflict.
        conflict_remove: T,
    },
    /// Removes one of the add + remove pairs in the conflcit, making it less
    /// complicated (fewer sides than before)
    ///
    /// In terms of conflict theory, this is equivalent to adding the opposite
    /// diff and then simplifying the conflict. However, for the user, the
    /// difference between these is signfificant.
    RemovedConflictDiff {
        /// Add of the removed diff
        ///
        /// This is an "add" of the left conflict that is not an "add" of the
        /// right conflict.
        conflict_add: T,
        /// Remove of the removed diff
        ///
        /// This is a "remove" of the left conflict that is not a "remove" of
        /// the right conflict.
        conflict_remove: T,
    },
    /// Modifies one of the adds of the conflict
    ///
    /// This does not change the number of sides in the conflict.
    ChangedConflictAdd {
        /// Add of the left conflict
        left_version: T,
        /// Add of the right conflict
        right_version: T,
    },
    /// Modifies one of the removes of the conflict
    ///
    /// This does not change the number of sides in the conflict.
    // TODO(ilyagr): While this operation is very natural from the perspective of conflict
    // theory, I find it hard to come up with an example where it is an
    // intuitive result of some operation. If it's too unintuitive to users, we could try to
    // replace it with a composition of "removed diff" and "added diff".
    ChangedConflictRemove {
        /// Remove of the left conflict
        left_version: T,
        /// Remove of the right conflict
        right_version: T,
    },
    /// States that both the left and the right conflict contin this value as an
    /// "add"
    UnchangedConflictAdd(T),
    /// States that both the left and the right conflict contin this value as a
    /// "remove"
    UnchangedConflictRemove(T),
}

Examples

The exact UI will very likely change in the future. For now, it's designed so that it doesn't absolutely require color and so that I can understand this.

Here is a simple example of conflict resolution from #4246 (reply in thread) (the discussion question helpfully has reproduction commands):

image

This is equivalent to:

diff --git a/foo b/foo
index 0000000000..100b0dec8c 100644
--- a/foo
+++ b/foo
@@ -1,7 +1,1 @@
-<<<<<<< Conflict 1 of 1
-%%%%%%% Changes from base to side #1
--bar
-+baz
-+++++++ Contents of side #2
-foo
->>>>>>> Conflict 1 of 1 ends
+qux

or, equivalently,

diff --git a/foo b/foo
index 0000000000..100b0dec8c 100644
--- a/foo
+++ b/foo
@@ -1,7 +1,1 @@
-<<<<<<< Conflict 1 of 1
-%%%%%%% Changes from base to side #1
--bar
-+foo
-+++++++ Contents of side #2
-baz
->>>>>>> Conflict 1 of 1 ends
+qux

This is how jj actually printed it. There's a discussion at #4246 (reply in thread) about whether it would make sense to adjust this PR's presentation to make it more similar.


Another example of conflict resolution image

I am not sure why the "removed diff" does not have + and - lines more interleaved; I think that would be better.

In textual form:

$ jj diff --conflicts
Resolved conflict in lib/src/merge.rs:
        ...
                )
            };

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< BEGIN CONFLICT DIFF
=========== Changed one of the Sides
-            let remove_comes_from_right = if let Some((ix, _)) = right
-                .removes()
-                .enumerate()
-                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == remove)
-            {
+            let remove_comes_from_right = if let Some((ix, _)) =
+                right.removes().enumerate().find(|(ix, elt)| {
+                    !right_seen.get_remove(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(remove)
+                }) {
                 *right_seen.get_remove_mut(ix).unwrap() = true;
                 true
-            } else if let Some((ix, _)) = left
-                .adds()
-                .enumerate()
-                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == remove)
-            {
+            } else if let Some((ix, _)) = left.adds().enumerate().find(|(ix, elt)| {
+                !left_seen.get_add(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(remove)
+            }) {
                 *left_seen.get_add_mut(ix).unwrap() = true;
                 false
             } else {
                 panic!(
-                    "removes of (right - left) should be either removes on the right or adds on \
-                     the left."
+                    "removes of (right - left) should be either removes on the right or adds \
+                         on the left."
                 )
             };
=========== Removed a Diff (-1 Base and -1 Side)
-|              // TODO: What order should this if be in? Does it matter?
-| -            let remove_comes_from_right = if let Some((ix, _)) = right
-| -                .removes()
-| -                .enumerate()
-| -                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == add)
-| -            {
-| -                *right_seen.get_remove_mut(ix).unwrap() = true;
-| -                true
-| -            } else if let Some((ix, _)) = left
-| -                .adds()
-| -                .enumerate()
-| -                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == add)
-| -            {
-| -                *left_seen.get_add_mut(ix).unwrap() = true;
-| -                false
-| -            } else {
-| -                panic!(
-| -                    "removes of (right - left) should be either removes on the right or adds on \
-| -                     the left."
-| -                )
-| -            };
-| +            let remove_comes_from_right =
-| +                if let Some((ix, _)) = right.removes().enumerate().find(|(ix, elt)| {
-| +                    !right_seen.get_remove(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
-| +                }) {
-| +                    *right_seen.get_remove_mut(ix).unwrap() = true;
-| +                    true
-| +                } else if let Some((ix, _)) = left.adds().enumerate().find(|(ix, elt)| {
-| +                    !left_seen.get_add(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
-| +                }) {
-| +                    *left_seen.get_add_mut(ix).unwrap() = true;
-| +                    false
-| +                } else {
-| +                    panic!(
-| +                        "removes of (right - left) should be either removes on the right or adds \
-| +                         on the left."
-| +                    )
-| +                };
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   END CONFLICT DIFF

            // TODO: Fewer clones?
            match dbg!(add_comes_from_right, remove_comes_from_right) {

Presumably, the user would compare whether the two diffs in the example above mean the same thing.

Example of conflict creation

image

This gives a good overview of what conflict creations look like, but it also shows some problems with this PR's technique. It's hard to see the actual conflict in there, which looks as follows.

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-            // TODO: What order should this if be in? Does it matter?
             let remove_comes_from_right = if let Some((ix, _)) = right
                 .removes()
                 .enumerate()
-                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == add)
+                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == remove)
             {
                 *right_seen.get_remove_mut(ix).unwrap() = true;
                 true
             } else if let Some((ix, _)) = left
                 .adds()
                 .enumerate()
-                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == add)
+                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == remove)
             {
                 *left_seen.get_add_mut(ix).unwrap() = true;
                 false
             } else {
                 panic!(
                     "removes of (right - left) should be either removes on the right or adds on \
                      the left."
                 )
             };
+++++++ Contents of side #2
            // TODO: What order should this if be in? Does it matter?
            let remove_comes_from_right =
                if let Some((ix, _)) = right.removes().enumerate().find(|(ix, elt)| {
                    !right_seen.get_remove(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
                }) {
                    *right_seen.get_remove_mut(ix).unwrap() = true;
                    true
                } else if let Some((ix, _)) = left.adds().enumerate().find(|(ix, elt)| {
                    !left_seen.get_add(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
                }) {
                    *left_seen.get_add_mut(ix).unwrap() = true;
                    false
                } else {
                    panic!(
                        "removes of (right - left) should be either removes on the right or adds \
                         on the left."
                    )
                };
>>>>>>> Conflict 1 of 1 ends

It's unclear to me at the moment whether and how this could be addressed in general. We could just print the created conflict, but could this be generalized to all situations?


TODOs

  • More tests
  • Finish up decisions on how to name various functions and types for now, whether DiffOfMerge should be public, etc.
  • Decide whether some of the attempts to de-duplicate functionality of this PR and of conflict materialization should be reverted.
  • Remember more TODOs

Missing features outside the scope of this PR

Off the top of my head:

  • Keeping track of line numbers (crucial for blame or absorb, so probably the next priority after this PR)
  • Something closer to a machine-readable diff3 format
  • Something closer to color-words format
  • Better algorithms for everything, especially splitting into hunks. TODO: I have seen at least one example where splitting into hunks works out particularly badly with this code, I should find it an present it.
  • Faster algorithms. This PR might have performance regressions, in fact, since I try to reuse code between conflict materialization and conflict diffs. If so, we could postpone fixing them till later, or we could keep the old algorithms and create some code duplication until we can have a fast algorithm that does both.
  • Probably a dozen more things I forgot

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@ilyagr ilyagr changed the title Diffofmerge Proof of concept: structured diff of conflicted commits Aug 11, 2024
@ilyagr ilyagr force-pushed the diffofmerge branch 9 times, most recently from 6f1379a to ef24817 Compare August 13, 2024 05:50
@ilyagr ilyagr force-pushed the diffofmerge branch 4 times, most recently from 50a483c to 7fc0f29 Compare August 16, 2024 03:05
@ilyagr ilyagr force-pushed the diffofmerge branch 3 times, most recently from 58c0a90 to 78ba3fc Compare August 23, 2024 01:38
@ilyagr ilyagr force-pushed the diffofmerge branch 3 times, most recently from 629d1b0 to d1c35e0 Compare September 5, 2024 22:19
These will be used for diffs of Merge-s.

This also refactors one of the less important tests. I did it mostly automatically with `ast-grep` (which I don't necessarily recommend), but we could also just delete this test I think.
This reorders the result slightly and makes it easier to generalize this to work on diffs of merges (where there is the same number of adds and removes).
This represents a diff of two merges, up to reordering and cancelling
out of terms. It can also be thought of as a stack of `n` diffs.

This type is not very suitable for presenting to the user (since it
loses information they might want to see), but is very natural from the
perspective of conflict theory.
…tion

For textual conflicts, the distance function will be the size of the diff between two chunks of text.
…ture

This turns two Merge-s into an object that represents a way to present their difference to the user, in the form of `Vec<DiffExplanationAtom>`.

`DiffExplanationAtom` is the key structure for the data model of differences between diffs.

The algorithm is inefficient and far from perfect, for now my focus is on the data model.
The goal here is to create a single operation to split a conflicted file text into a sequence of (potentially conflicted) hunks, both for regular conflicts and conflict diffs.

The new algorithm is not efficient and I'm guessing an algorithm that gives better results (more logical hunks) is possilbe.

We may want to instead keep the conflicts and conflict diff code path separate until the algorithm improves, let me know.
This works with all the places diffs are shown, e.g. `jj diff`, `jj log -p`, `jj interdiff`, etc.
I initially tried just `+` and `-`, but that seemed confusing. The color helps, but I think this is ~understandable either way
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant