Proof of concept: structured diff of conflicted commits #4251

ilyagr · 2024-08-11T04:51:35Z

This is a proof of concept for #4062. None of the design, the code, or functionality are in a finished state, but I'd like to polish it up to the point where it can be merged for later improvement.

My main goal here is to set up the data model for diffs between conflicts, and to have an example of a UI for showing these diffs to the user. I am not sure whether the latter will be of actual use to users who don't understand conflict theory very well, but the goal is 1) do our best 2) have a debugging tool for exports and 3) most importantly, have a way to present these data structures in tests as we improve them to the point where they can be used in blame, absorb, etc.

The algorithms that actually do all the jobs are inefficient and I already have examples of cases where they do a bad job. Hopefully, they can be improved as a separate project.

Organization

The PR splits into three parts:

Part 1. Everything up to the lib merge: the key explain_diff_of_merges function and data structure commit exists to set up that key commit.

That commit sets up the DiffOfMergeExplanationAtom<T> type, presented below. The key function has the signature
```
fn explain_diff_of_merges(
   left: Merge<T>,
   right: Merge<T>,
   distance: DistanceFunction
) -> Vec<DiffOfMergeExplanationAtom<T>>
```
where the distance function T⨯T→usize is used to decide which pieces of the conflicts should be paired together into diffs. The distance function we'll use is the size of the diff between two chunks of text.
Part 2. There is a questionable (and less important for now) part that deals with splitting a Merge<Vec<u8>> into hunks.
Part 3. The rest sets up jj diff --conflicts, which is a way the UI to present an object of type Vec<DiffOfMergeExplanationAtom<T>> to the user. The key commit is cli diff-util: implement diff --structured-conflicts, the commits after that try to add a bit of polish.

Unless people object to the general direction of this, I'll start by trying to merge part 1.

Data model: `DiffOfMergeExplanationAtom`

/// A statement about a difference of two conflicts of type `Merge<T>`
///
/// A (conceptually unordered) set of `DiffExplanationAtom<T>` describes a
/// conflict `left: Merge<T>` and a sequence of operations to turn it into a
/// different conflict `right: Merge<T>`. This is designed so that this
/// information can be presented to the user.
///
/// This description is far from unique. We usually pick one by trying to
/// minimize the complexity of the diffs in those operations that involve diffs.
#[derive(PartialEq, Eq, Clone, Debug)]
pub enum DiffExplanationAtom<T> {
    /// Adds one more pair of an add + a remove to the conflict, making it more
    /// complicated (more sides than before)
    AddedConflictDiff {
        /// Add of the added diff.
        ///
        /// This is an "add" of the right conflict that is not an "add" of the
        /// left conflict.
        conflict_add: T,
        /// Remove of the added diff
        ///
        /// This is a "remove" of the right conflict that is not a "remove" of
        /// the left conflict.
        conflict_remove: T,
    },
    /// Removes one of the add + remove pairs in the conflcit, making it less
    /// complicated (fewer sides than before)
    ///
    /// In terms of conflict theory, this is equivalent to adding the opposite
    /// diff and then simplifying the conflict. However, for the user, the
    /// difference between these is signfificant.
    RemovedConflictDiff {
        /// Add of the removed diff
        ///
        /// This is an "add" of the left conflict that is not an "add" of the
        /// right conflict.
        conflict_add: T,
        /// Remove of the removed diff
        ///
        /// This is a "remove" of the left conflict that is not a "remove" of
        /// the right conflict.
        conflict_remove: T,
    },
    /// Modifies one of the adds of the conflict
    ///
    /// This does not change the number of sides in the conflict.
    ChangedConflictAdd {
        /// Add of the left conflict
        left_version: T,
        /// Add of the right conflict
        right_version: T,
    },
    /// Modifies one of the removes of the conflict
    ///
    /// This does not change the number of sides in the conflict.
    // TODO(ilyagr): While this operation is very natural from the perspective of conflict
    // theory, I find it hard to come up with an example where it is an
    // intuitive result of some operation. If it's too unintuitive to users, we could try to
    // replace it with a composition of "removed diff" and "added diff".
    ChangedConflictRemove {
        /// Remove of the left conflict
        left_version: T,
        /// Remove of the right conflict
        right_version: T,
    },
    /// States that both the left and the right conflict contin this value as an
    /// "add"
    UnchangedConflictAdd(T),
    /// States that both the left and the right conflict contin this value as a
    /// "remove"
    UnchangedConflictRemove(T),
}

Examples

The exact UI will very likely change in the future. For now, it's designed so that it doesn't absolutely require color and so that I can understand this.

Here is a simple example of conflict resolution from #4246 (reply in thread) (the discussion question helpfully has reproduction commands):

This is equivalent to:

diff --git a/foo b/foo
index 0000000000..100b0dec8c 100644
--- a/foo
+++ b/foo
@@ -1,7 +1,1 @@
-<<<<<<< Conflict 1 of 1
-%%%%%%% Changes from base to side #1
--bar
-+baz
-+++++++ Contents of side #2
-foo
->>>>>>> Conflict 1 of 1 ends
+qux

or, equivalently,

diff --git a/foo b/foo
index 0000000000..100b0dec8c 100644
--- a/foo
+++ b/foo
@@ -1,7 +1,1 @@
-<<<<<<< Conflict 1 of 1
-%%%%%%% Changes from base to side #1
--bar
-+foo
-+++++++ Contents of side #2
-baz
->>>>>>> Conflict 1 of 1 ends
+qux

This is how jj actually printed it. There's a discussion at #4246 (reply in thread) about whether it would make sense to adjust this PR's presentation to make it more similar.

Another example of conflict resolution

I am not sure why the "removed diff" does not have + and - lines more interleaved; I think that would be better.

In textual form:

$ jj diff --conflicts
Resolved conflict in lib/src/merge.rs:
        ...
                )
            };

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< BEGIN CONFLICT DIFF
=========== Changed one of the Sides
-            let remove_comes_from_right = if let Some((ix, _)) = right
-                .removes()
-                .enumerate()
-                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == remove)
-            {
+            let remove_comes_from_right = if let Some((ix, _)) =
+                right.removes().enumerate().find(|(ix, elt)| {
+                    !right_seen.get_remove(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(remove)
+                }) {
                 *right_seen.get_remove_mut(ix).unwrap() = true;
                 true
-            } else if let Some((ix, _)) = left
-                .adds()
-                .enumerate()
-                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == remove)
-            {
+            } else if let Some((ix, _)) = left.adds().enumerate().find(|(ix, elt)| {
+                !left_seen.get_add(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(remove)
+            }) {
                 *left_seen.get_add_mut(ix).unwrap() = true;
                 false
             } else {
                 panic!(
-                    "removes of (right - left) should be either removes on the right or adds on \
-                     the left."
+                    "removes of (right - left) should be either removes on the right or adds \
+                         on the left."
                 )
             };
=========== Removed a Diff (-1 Base and -1 Side)
-|              // TODO: What order should this if be in? Does it matter?
-| -            let remove_comes_from_right = if let Some((ix, _)) = right
-| -                .removes()
-| -                .enumerate()
-| -                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == add)
-| -            {
-| -                *right_seen.get_remove_mut(ix).unwrap() = true;
-| -                true
-| -            } else if let Some((ix, _)) = left
-| -                .adds()
-| -                .enumerate()
-| -                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == add)
-| -            {
-| -                *left_seen.get_add_mut(ix).unwrap() = true;
-| -                false
-| -            } else {
-| -                panic!(
-| -                    "removes of (right - left) should be either removes on the right or adds on \
-| -                     the left."
-| -                )
-| -            };
-| +            let remove_comes_from_right =
-| +                if let Some((ix, _)) = right.removes().enumerate().find(|(ix, elt)| {
-| +                    !right_seen.get_remove(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
-| +                }) {
-| +                    *right_seen.get_remove_mut(ix).unwrap() = true;
-| +                    true
-| +                } else if let Some((ix, _)) = left.adds().enumerate().find(|(ix, elt)| {
-| +                    !left_seen.get_add(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
-| +                }) {
-| +                    *left_seen.get_add_mut(ix).unwrap() = true;
-| +                    false
-| +                } else {
-| +                    panic!(
-| +                        "removes of (right - left) should be either removes on the right or adds \
-| +                         on the left."
-| +                    )
-| +                };
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   END CONFLICT DIFF

            // TODO: Fewer clones?
            match dbg!(add_comes_from_right, remove_comes_from_right) {

Presumably, the user would compare whether the two diffs in the example above mean the same thing.

Example of conflict creation

This gives a good overview of what conflict creations look like, but it also shows some problems with this PR's technique. It's hard to see the actual conflict in there, which looks as follows.

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-            // TODO: What order should this if be in? Does it matter?
             let remove_comes_from_right = if let Some((ix, _)) = right
                 .removes()
                 .enumerate()
-                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == add)
+                .find(|(ix, elt)| !right_seen.get_remove(*ix).unwrap() && *elt == remove)
             {
                 *right_seen.get_remove_mut(ix).unwrap() = true;
                 true
             } else if let Some((ix, _)) = left
                 .adds()
                 .enumerate()
-                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == add)
+                .find(|(ix, elt)| !left_seen.get_add(*ix).unwrap() && *elt == remove)
             {
                 *left_seen.get_add_mut(ix).unwrap() = true;
                 false
             } else {
                 panic!(
                     "removes of (right - left) should be either removes on the right or adds on \
                      the left."
                 )
             };
+++++++ Contents of side #2
            // TODO: What order should this if be in? Does it matter?
            let remove_comes_from_right =
                if let Some((ix, _)) = right.removes().enumerate().find(|(ix, elt)| {
                    !right_seen.get_remove(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
                }) {
                    *right_seen.get_remove_mut(ix).unwrap() = true;
                    true
                } else if let Some((ix, _)) = left.adds().enumerate().find(|(ix, elt)| {
                    !left_seen.get_add(dbg!(*ix)).unwrap() && dbg!(*elt) == dbg!(add)
                }) {
                    *left_seen.get_add_mut(ix).unwrap() = true;
                    false
                } else {
                    panic!(
                        "removes of (right - left) should be either removes on the right or adds \
                         on the left."
                    )
                };
>>>>>>> Conflict 1 of 1 ends

It's unclear to me at the moment whether and how this could be addressed in general. We could just print the created conflict, but could this be generalized to all situations?

TODOs

More tests
Finish up decisions on how to name various functions and types for now, whether DiffOfMerge should be public, etc.
Decide whether some of the attempts to de-duplicate functionality of this PR and of conflict materialization should be reverted.
Remember more TODOs

Missing features outside the scope of this PR

Off the top of my head:

Keeping track of line numbers (crucial for blame or absorb, so probably the next priority after this PR)
Something closer to a machine-readable diff3 format
Something closer to color-words format
Better algorithms for everything, especially splitting into hunks. TODO: I have seen at least one example where splitting into hunks works out particularly badly with this code, I should find it an present it.
Faster algorithms. This PR might have performance regressions, in fact, since I try to reuse code between conflict materialization and conflict diffs. If so, we could postpone fixing them till later, or we could keep the old algorithms and create some code duplication until we can have a fast algorithm that does both.
Probably a dozen more things I forgot

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

These will be used for diffs of Merge-s. This also refactors one of the less important tests. I did it mostly automatically with `ast-grep` (which I don't necessarily recommend), but we could also just delete this test I think.

This reorders the result slightly and makes it easier to generalize this to work on diffs of merges (where there is the same number of adds and removes).

This represents a diff of two merges, up to reordering and cancelling out of terms. It can also be thought of as a stack of `n` diffs. This type is not very suitable for presenting to the user (since it loses information they might want to see), but is very natural from the perspective of conflict theory.

…tion For textual conflicts, the distance function will be the size of the diff between two chunks of text.

…ture This turns two Merge-s into an object that represents a way to present their difference to the user, in the form of `Vec<DiffExplanationAtom>`. `DiffExplanationAtom` is the key structure for the data model of differences between diffs. The algorithm is inefficient and far from perfect, for now my focus is on the data model.

The goal here is to create a single operation to split a conflicted file text into a sequence of (potentially conflicted) hunks, both for regular conflicts and conflict diffs. The new algorithm is not efficient and I'm guessing an algorithm that gives better results (more logical hunks) is possilbe. We may want to instead keep the conflicts and conflict diff code path separate until the algorithm improves, let me know.

This works with all the places diffs are shown, e.g. `jj diff`, `jj log -p`, `jj interdiff`, etc.

I initially tried just `+` and `-`, but that seemed confusing. The color helps, but I think this is ~understandable either way

ilyagr changed the title ~~Diffofmerge~~ Proof of concept: structured diff of conflicted commits Aug 11, 2024

ilyagr force-pushed the diffofmerge branch 9 times, most recently from 6f1379a to ef24817 Compare August 13, 2024 05:50

ilyagr mentioned this pull request Aug 13, 2024

Explaining diffs of Merge<T> (structured conflicts part 1) #4259

Draft

1 task

ilyagr force-pushed the diffofmerge branch 4 times, most recently from 50a483c to 7fc0f29 Compare August 16, 2024 03:05

ilyagr mentioned this pull request Aug 19, 2024

diff: add option to display complex color-words diffs without inlining #4298

Merged

4 tasks

ilyagr force-pushed the diffofmerge branch 3 times, most recently from 58c0a90 to 78ba3fc Compare August 23, 2024 01:38

ilyagr force-pushed the diffofmerge branch 3 times, most recently from 629d1b0 to d1c35e0 Compare September 5, 2024 22:19

ilyagr added 8 commits September 15, 2024 18:03

lib merge: extract get_simplified_mapping and co.

226ce13

These will be used for diffs of Merge-s. This also refactors one of the less important tests. I did it mostly automatically with `ast-grep` (which I don't necessarily recommend), but we could also just delete this test I think.

lib get_simplified_mapping: change algorithm slightly

db2807c

This reorders the result slightly and makes it easier to generalize this to work on diffs of merges (where there is the same number of adds and removes).

tmp: test changes

8f3c51f

lib merge: function to optimize diffs of merges wrt a distance func…

f67b150

…tion For textual conflicts, the distance function will be the size of the diff between two chunks of text.

diff_util: extract show_unified_diff_hunk to show single hunk

fe554c5

ilyagr added 5 commits September 15, 2024 18:16

cli diff-util: implement diff --structured-conflicts

173623b

This works with all the places diffs are shown, e.g. `jj diff`, `jj log -p`, `jj interdiff`, etc.

indent added/removed diffs with +| or -|

aa4d58e

I initially tried just `+` and `-`, but that seemed confusing. The color helps, but I think this is ~understandable either way

diff_util: tried to make conflict diff headers clearer

ef9952c

Show simple conflicts simply

377ae1c

Omit trivial diffs

e5ed546

ilyagr force-pushed the diffofmerge branch from d1c35e0 to e5ed546 Compare September 16, 2024 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept: structured diff of conflicted commits #4251

Proof of concept: structured diff of conflicted commits #4251

ilyagr commented Aug 11, 2024 •

edited

Loading

Proof of concept: structured diff of conflicted commits #4251

Are you sure you want to change the base?

Proof of concept: structured diff of conflicted commits #4251

Conversation

ilyagr commented Aug 11, 2024 • edited Loading

Organization

Examples

TODOs

Missing features outside the scope of this PR

Checklist

ilyagr commented Aug 11, 2024 •

edited

Loading