Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3238

ZM25XC · 2024-07-11T14:41:34Z

Description

I have encountered decoding errors with some requests that use ASCII encoding. Changing the default encoding to UTF-8 resolves these errors. I propose updating the normalize_header_key and normalize_header_value functions in _utils.py to use UTF-8 as the default encoding.

Steps to Reproduce

Call normalize_header_key or normalize_header_value with a non-ASCII string and no encoding specified.
Observe the decoding failure with ASCII encoding.
Change the encoding to UTF-8 and observe that the error is resolved.

Example Code

header_key_unicode = "内容类型"
normalized_key_unicode = normalize_header_key(header_key_unicode, lower=True)
# This raises a UnicodeEncodeError with ASCII encoding.

normalized_key_unicode_utf8 = normalize_header_key(header_key_unicode, lower=True, encoding="utf-8")
print(normalized_key_unicode_utf8)  # Works correctly with UTF-8 encoding.

Proposed Solution

Modify the _utils.py file to use UTF-8 as the default encoding:

def normalize_header_key(
    value: str | bytes,
    lower: bool,
    encoding: str | None = None,
) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header key.
    """
    if isinstance(value, bytes):
        bytes_value = value
    else:
        bytes_value = value.encode(encoding or "utf-8")

    return bytes_value.lower() if lower else bytes_value

def normalize_header_value(value: str | bytes, encoding: str | None = None) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header value.
    """
    if isinstance(value, bytes):
        return value
    return value.encode(encoding or "utf-8")

Rationale

Using UTF-8 as the default encoding ensures that the functions can handle a wider range of input values without raising an error. UTF-8 encoding is capable of encoding a larger set of characters compared to ASCII.

The text was updated successfully, but these errors were encountered:

iamjatinyadav · 2024-07-13T19:04:58Z

hey guys, can I work on this?

ZM25XC · 2024-07-14T13:14:44Z

hey guys, can I work on this?
Thank you for your interest in this issue! I am currently working on a Pull Request to address this problem. If you have any suggestions or would like to review the changes, your input would be greatly appreciated.

tomchristie · 2024-07-23T13:44:02Z

Heya, thanks for the consideration... I think this may be valid, tho could you re-frame it so that you're describing the issue against public API. For example describe this using httpx.Headers(...). We can then unpick exactly what should and should not be valid.

Edit: Okay, I see PR #3241 now. We don't want to support utf-8 in header keys which have a constrained set of allowed characters. I do think it's a sensible default for header values tho.

ZM25XC · 2024-07-24T12:42:59Z

嘿，谢谢你的考虑......我认为这可能是有效的，您能否重新构建它，以便您描述针对公共 API 的问题。例如，使用来描述这一点。然后，我们可以准确地选择哪些内容应该有效，哪些内容不应该有效。httpx.Headers(...)

编辑：好的，我现在看到 PR #3241。我们不希望在具有一组受约束的允许字符的标头键中支持 utf-8。我确实认为这是标头值的明智默认值。

In a request, if the headers contain non-ASCII characters and the encoding is not specified, an error will be raised. For example:

from httpx import Headers

x = Headers(
    {
        "Referer": "https://www.google.com/search?q=テスト",
    }
)
print(x)

In the example code, the Referer field contains the Japanese word テスト, which will raise an error and interrupt the request.

The same issue can occur in responses. If the server's response does not specify the encoding, httpx will use ASCII encoding to decode テスト, resulting in an error.

Therefore, I suggest changing the default encoding from ASCII to UTF-8 to better handle such cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3238

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3238

ZM25XC commented Jul 11, 2024

iamjatinyadav commented Jul 13, 2024

ZM25XC commented Jul 14, 2024

tomchristie commented Jul 23, 2024 •

edited

Loading

ZM25XC commented Jul 24, 2024

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3238

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3238

Comments

ZM25XC commented Jul 11, 2024

Description

Steps to Reproduce

Example Code

Proposed Solution

Rationale

iamjatinyadav commented Jul 13, 2024

ZM25XC commented Jul 14, 2024

tomchristie commented Jul 23, 2024 • edited Loading

ZM25XC commented Jul 24, 2024

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3238

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3238

tomchristie commented Jul 23, 2024 •

edited

Loading