[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

jozkee · 2024-09-10T21:48:07Z

Description

dotnet/runtime#80331 introduced a minor breaking change only affecting malformed encoded payloads.

Prior to .NET 9, a malformed encoded string [0x01, 0xC2] parsed with BinaryReader.ReadString() would return an empty string.

ON .NET 9, it would return "\uFFFD" which is the REPLACEMENT CHARACTER used to replace an unknown, unrecognised, or unrepresentable character. We accepted this change because it only affected malformed payloads and matches Unicode standards.

Version

.NET 9 Preview 7

Previous behavior

var ms = new MemoryStream(new byte[] { 0x01, 0xC2 });
using (var br = new BinaryReader(ms))
{
    string s = br.ReadString();
    Console.WriteLine(s == "\uFFFD"); // false
    Console.WriteLine(s.Length); // 0
}

New behavior

var ms = new MemoryStream(new byte[] { 0x01, 0xC2 });
using (var br = new BinaryReader(ms))
{
    string s = br.ReadString();
    Console.WriteLine(s == "\uFFFD"); // true
    Console.WriteLine(s.Length); // 1
}

Type of breaking change

Binary incompatible: Existing binaries might encounter a breaking change in behavior, such as failure to load or execute, and if so, require recompilation.
Source incompatible: When recompiled using the new SDK or component or to target the new runtime, existing source code might require source changes to compile successfully.
Behavioral change: Existing binaries might behave differently at run time.

Reason for change

Perf improvement affecting a rare scenario.

Recommended action

If you want to keep the previous behavior where incomplete byte sequence were being omitted at the end of the string, you can TrimEnd("\uFFFD") the result.

Feature area

Core .NET libraries

Affected APIs

BinaryReader.ReadString()

jozkee · 2024-09-10T21:56:19Z

FWIW: This is somewhat undefined behavior and is inconsistent with other decoding APIs in BinaryReader. Using [0xC2], ReadChar() throws EndOfStreamException and ReadChars(1) returns an empty array.

cc @adamsitnik @GrabYourPitchforks @jeffhandley @teo-tsirpanis

jozkee added doc-idea Indicates issues that are suggestions for new topics [org][type][category] breaking-change Indicates a .NET Core breaking change Pri1 High priority, do before Pri2 and Pri3 labels Sep 10, 2024

jozkee assigned gewarren Sep 10, 2024

dotnet-bot added ⌚ Not Triaged Not triaged labels Sep 10, 2024

jozkee mentioned this issue Sep 11, 2024

Fix regression or document breaking change in BinaryReader dotnet/runtime#93500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

jozkee commented Sep 10, 2024

jozkee commented Sep 10, 2024

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

[Breaking change]: BinaryReader.GetString() will return "\uFFFD" on malformed encoded string sequences. #42564

Comments

jozkee commented Sep 10, 2024

Description

Version

Previous behavior

New behavior

Type of breaking change

Reason for change

Recommended action

Feature area

Affected APIs

jozkee commented Sep 10, 2024