Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-10801: Port CASSANDRA-19461 (fix bug about SAI and empty strings) #1284

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

adelapena
Copy link

Ports CASSANDRA-19461 into main.

Currently, SAI doesn't create an index entry for empty values. As a result, index queries such as SELECT * FROM %s WHERE v = '' will never find anything. However, the same query with ALLOW FILTERING can find rows.

The ported patch makes sure that empty values are indexed. However, existing indexes would need a rebuild to have these entries.

We need to evaluate if this can be a breaking change for some users. The only side effect I can think of is that indexes with many empty values that are never queries would become larger after this change.

Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I support this patch, I think that this is a breaking behavior change, and it may break existing applications.

What was the behaviour in DSE ? (the version we have in Astra DSE)

cc @jtgrabowski @sbtourist @JeremiahDJordan

@eolivelli eolivelli changed the title CNDB-10801: Port CASSANDRA-19461 CNDB-10801: Port CASSANDRA-19461 (fix bug about SAI and empty strings) Sep 16, 2024
@JeremiahDJordan
Copy link
Member

It is not a breaking useful behavior. Prior to this patch a query for "" would always return nothing if there was an index, and something if there was not an index and ALLOW FILTERING used. Running a query that will never return results is not useful, so we are fine. I would think this is actually a bug fix. We should have been rejecting a query for "" or returning the answer, not returning an empty result set. This makes us actually return the answer.

Also to make sure, empty string is different from deleted?

@adelapena
Copy link
Author

What was the behaviour in DSE ? (the version we have in Astra DSE)

6.9 is also affected, it has the same behaviour as CC. I'm almost sure it's the same in 6.8, since the code is identical and it seems no one noticed this issue before.

Also to make sure, empty string is different from deleted?

Yes, setting the column to null or deleting it ignores the write, and deleting the row is a different path. Those cases seem to work with and without this patch.

Copy link
Collaborator

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about VectorMemtableIndex index method.
It seems sonar is also unhappy on test coverage and I did not find the butler diff link.
While index format seems not to be affected, you mentioned people would have to rebuild their indexes to see the change. How do we flag that for them? How is such a fix handled normally in Astra? I'm not sure.
Is the plan to finish first this PR and then the 5.0 rebase branch? We seem to miss the new orderBy test there.

@adelapena
Copy link
Author

Thanks for the review.

I have updated LuceneAnalyzerTest#verifyEmptyStringIndexingBehaviorOnNonAnalyzedColumn to the new expectations.

Regarding Sonar failures, I haven't ported the tests for AbstractType#allowsEmpty included in CASSANDRA-19461 because they use a lot of tooling that isn't available in our branch (including Accord dependencies!). However, those tests are present in the 5.0 rebase branch.

Regarding how we flag that people would need to rebuild the indexes to benefit from this fix, I'm not sure. We cannot detect it at query time because lots of queries can be affected. For example, a simple ORDER BY without filters, or a range query. The only idea that comes to mind, besides release notes, is bumping the index format and asking users to rebuild if there is an index on an old version. That would mean treating this as the index corruption it is. But I'm not sure we need to make so much noise about it since apparently no one has found this issue in years. @JeremiahDJordan wdyt?

Copy link

sonarcloud bot commented Sep 18, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
38.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants