Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New error: DXGI_ERROR_DEVICE_HUNG (0x887A0006) #489

Open
time2bot opened this issue Jun 24, 2024 · 0 comments
Open

New error: DXGI_ERROR_DEVICE_HUNG (0x887A0006) #489

time2bot opened this issue Jun 24, 2024 · 0 comments

Comments

@time2bot
Copy link

This happens to me in "Phi-3-mini-4k-instruct-q4f32_1-MLC-1k", "gemma-2b-it-q4f32_1-MLC-1k",

After updating my GPU drivers to:

Intel(R) UHD Graphics 630

Driver version: 31.0.101.2115
Driver date: 16/11/2022
DirectX version: 12 (FL 12.1)
Physical ___location: PCI bus 0, device 2, function 0

Utilization 12%
Dedicated GPU memory
Shared GPU memory 0.6/7.9 GB
GPU Memory 0.6/7.9 GB

I have the webGPU report:

#1 high-performance

adapter info:

architecture gen-9
description
device
vendor intel

flags:

isFallbackAdapter false

limits:

maxBindGroups 4
maxBindGroupsPlusVertexBuffers 24
maxBindingsPerBindGroup 1000
maxBufferSize 2147483648 (2gb)
maxColorAttachmentBytesPerSample 128
maxColorAttachments 8
maxComputeInvocationsPerWorkgroup 1024
maxComputeWorkgroupSizeX 1024
maxComputeWorkgroupSizeY 1024
maxComputeWorkgroupSizeZ 64
maxComputeWorkgroupStorageSize 32768 (32k)
maxComputeWorkgroupsPerDimension 65535 (64k)
maxDynamicStorageBuffersPerPipelineLayout 8
maxDynamicUniformBuffersPerPipelineLayout 10
maxInterStageShaderComponents 112
maxInterStageShaderVariables 28
maxSampledTexturesPerShaderStage 16
maxSamplersPerShaderStage 16
maxStorageBufferBindingSize 2147483644 (2gb)
maxStorageBuffersPerShaderStage 10
maxStorageTexturesPerShaderStage 8
maxTextureArrayLayers 2048 (2k)
maxTextureDimension1D 16384 (16k)
maxTextureDimension2D 16384 (16k)
maxTextureDimension3D 2048 (2k)
maxUniformBufferBindingSize 65536 (64k)
maxUniformBuffersPerShaderStage 12
maxVertexAttributes 30
maxVertexBufferArrayStride 2048 (2k)
maxVertexBuffers 8
minStorageBufferOffsetAlignment 256
minUniformBufferOffsetAlignment 256

features:

bgra8unorm-storage
depth-clip-control
depth32float-stencil8
float32-filterable
indirect-first-instance
rg11b10ufloat-renderable
shader-f16
texture-compression-bc
timestamp-query

WGSL language features:

packed_4x8_integer_dot_product
pointer_composite_access
readonly_and_readwrite_storage_textures
unrestricted_pointer_parameters

misc:

fallback adapter not supported
getPreferredCanvasFormat bgra8unorm

dedicated workers:

webgpu API exists
requestAdapter successful
requestDevice successful
getContext("webgpu") successful
requestAnimationFrame successful
transferControlToOffscreen successful
OffscreenCanvas successful
CanvasRenderingContext2D successful

shared workers:

webgpu API exists
requestAdapter successful
requestDevice successful
getContext("webgpu") successful
transferControlToOffscreen successful
OffscreenCanvas successful
CanvasRenderingContext2D successful

service workers:

webgpu API exists
requestAdapter successful
requestDevice successful
getContext("webgpu") successful
transferControlToOffscreen successful
OffscreenCanvas successful
CanvasRenderingContext2D successful

I'm able to get inference, but on subsequent chat messages I'm getting a new error:

ID3D12Device::GetDeviceRemovedReason failed with DXGI_ERROR_DEVICE_HUNG (0x887A0006)

  • While handling unexpected error type Internal when allowed errors are (Validation|DeviceLost).
    at CheckHRESULTImpl (....\third_party\dawn\src\dawn\native\d3d\D3DError.cpp:119)

Backend messages:

  • Device removed reason: DXGI_ERROR_DEVICE_HUNG (0x887A0006)

and then, the GPU report also changes:

webgpu appears to be disabled

WGSL language features:

packed_4x8_integer_dot_product
pointer_composite_access
readonly_and_readwrite_storage_textures
unrestricted_pointer_parameters

misc:

fallback adapter not supported
getPreferredCanvasFormat bgra8unorm

dedicated workers:

webgpu API exists
requestAdapter(compat) failed
requestAnimationFrame successful
transferControlToOffscreen successful
OffscreenCanvas successful
CanvasRenderingContext2D successful

shared workers:

webgpu API exists
requestAdapter(compat) failed
transferControlToOffscreen successful
OffscreenCanvas successful
CanvasRenderingContext2D successful

service workers:

webgpu API exists
requestAdapter(compat) failed
transferControlToOffscreen successful
OffscreenCanvas successful
CanvasRenderingContext2D successful

So it seems that the inference is causing the webgpu to be disabled if too much info is passed to the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant