Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

Open
1 of 3 tasks
jozkee opened this issue Sep 10, 2024 · 1 comment
Assignees
Labels
breaking-change Indicates a .NET Core breaking change doc-idea Indicates issues that are suggestions for new topics [org][type][category] Pri1 High priority, do before Pri2 and Pri3 ⌚ Not Triaged Not triaged

Comments

@jozkee
Copy link
Member

jozkee commented Sep 10, 2024

Description

dotnet/runtime#80331 introduced a minor breaking change only affecting malformed encoded payloads.

Prior to .NET 9, a malformed encoded string [0x01, 0xC2] parsed with BinaryReader.ReadString() would return an empty string.

ON .NET 9, it would return "\uFFFD" which is the REPLACEMENT CHARACTER used to replace an unknown, unrecognised, or unrepresentable character. We accepted this change because it only affected malformed payloads and matches Unicode standards.

Version

.NET 9 Preview 7

Previous behavior

var ms = new MemoryStream(new byte[] { 0x01, 0xC2 });
using (var br = new BinaryReader(ms))
{
    string s = br.ReadString();
    Console.WriteLine(s == "\uFFFD"); // false
    Console.WriteLine(s.Length); // 0
}

New behavior

var ms = new MemoryStream(new byte[] { 0x01, 0xC2 });
using (var br = new BinaryReader(ms))
{
    string s = br.ReadString();
    Console.WriteLine(s == "\uFFFD"); // true
    Console.WriteLine(s.Length); // 1
}

Type of breaking change

  • Binary incompatible: Existing binaries might encounter a breaking change in behavior, such as failure to load or execute, and if so, require recompilation.
  • Source incompatible: When recompiled using the new SDK or component or to target the new runtime, existing source code might require source changes to compile successfully.
  • Behavioral change: Existing binaries might behave differently at run time.

Reason for change

Perf improvement affecting a rare scenario.

Recommended action

If you want to keep the previous behavior where incomplete byte sequence were being omitted at the end of the string, you can TrimEnd("\uFFFD") the result.

Feature area

Core .NET libraries

Affected APIs

BinaryReader.ReadString()

@jozkee jozkee added doc-idea Indicates issues that are suggestions for new topics [org][type][category] breaking-change Indicates a .NET Core breaking change Pri1 High priority, do before Pri2 and Pri3 labels Sep 10, 2024
@dotnet-bot dotnet-bot added ⌚ Not Triaged Not triaged labels Sep 10, 2024
@jozkee
Copy link
Member Author

jozkee commented Sep 10, 2024

FWIW: This is somewhat undefined behavior and is inconsistent with other decoding APIs in BinaryReader. Using [0xC2], ReadChar() throws EndOfStreamException and ReadChars(1) returns an empty array.

cc @adamsitnik @GrabYourPitchforks @jeffhandley @teo-tsirpanis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Indicates a .NET Core breaking change doc-idea Indicates issues that are suggestions for new topics [org][type][category] Pri1 High priority, do before Pri2 and Pri3 ⌚ Not Triaged Not triaged
Projects
Status: 🔖 Ready
Development

No branches or pull requests

3 participants