Quantcast
Channel: Questions in topic: "compute shader"
Viewing all articles
Browse latest Browse all 287

Compute Shader crashes when buffer is too big

$
0
0
I am trying to manipulate arbitrary data taking advantage of the parallelism offered by compute shaders. I've set up an example scenario to mess around and do some experiments. In particular what I'm trying to do is the following:
- I have a black and white picture, and the relative json serialization in which the pixels are stored in float array. If the pixel is white then it is a 1.0, otherwise 0.0 - I feed the float array to the compute shader using a compute buffer - The shader dispatches one thread for every cell in the array (so one for every pixel in the image) - Each thread reads the value of its cell/pixel: - If the value is "1.0" then it has to re-iterate the whole array, count all the "1.0"s, and store the counter in its cell of the output buffer
The algorithm works fine up to a certain size of the input: image dimensions 400x400 (array size of 160000), after which it crashes.
The specs of my system are:
- CPU: Intel core i7-4700mq cpu @ 2.40ghz - GPU: NVIDIA GeForce GT 750M with GDDR5 2GB VRAM - RAM: 8GB DDR3 - HDD: 256GB SSD - OS: Windows 10 - DirectX11
I've tried to run the code on a more beefy desktop PC with a GTX970 and, even though it can handle bigger inputs, image dimensions 500x500 (array size of 250000), it crashes as well after that.
I've looked in the Unity's log file, and when it crashes it is full of these error messages:
- d3d11: Failed to create 2D texture in GfxDeviceD3D11 - d3d11: failed to create buffer (target 0x1 mode 0 size 960) [0x887A0005] - Assertion failed on expression: 'SUCCEEDED(hr)'
I've also tried using RenderDoc to launch the compiled version of the unity project running the example scene, and capture the frame in which the dispatch call is executed but it gives me the following error: "renderdoc failed to open capture for replay: replaying the capture failed at API level." I think that it is not able to capture that frame because DirectX11 crashes.
This is the relevant part of the C# script that dispatches the compute shader:
using System.Collections; using System.Collections.Generic; using System.IO; using UnityEngine; public class ComputeShaderTest1 : MonoBehaviour { public TextAsset inputTextureData; private SerializableTextureData deserializedInputTextureData; private ComputeShader computeShader; private ComputeBuffer inputDataBuffer; private float[] outputValuesData; private ComputeBuffer outputDataBuffer; // Use this for initialization void Start() { deserializedInputTextureData = JsonUtility.FromJson(inputTextureData.text); computeShader = Resources.Load("Shaders/ComputeShader1"); if (computeShader == null) Debug.LogError("computeShader not found in the specified path"); else compute(); } private void compute() { int inputDataSize = deserializedInputTextureData.width * deserializedInputTextureData.height; int csMain = computeShader.FindKernel("CSMain"); if (csMain < 0) { Debug.Log("Initialization failed."); return; } uint threadGroupSizeX, threadGroupSizeY, threadGroupSizeZ; int offsetX, offsetY; int groupsX, groupsY, groupsZ; computeShader.GetKernelThreadGroupSizes(csMain, out threadGroupSizeX, out threadGroupSizeY, out threadGroupSizeZ); offsetX = (int)threadGroupSizeX - 1; offsetY = (int)threadGroupSizeY - 1; groupsX = (deserializedInputTextureData.width + offsetX) / (int)threadGroupSizeX; groupsY = (deserializedInputTextureData.height + offsetY) / (int)threadGroupSizeY; groupsZ = 1; inputDataBuffer = new ComputeBuffer(inputDataSize, sizeof(float)); inputDataBuffer.SetData(deserializedInputTextureData.data); computeShader.SetBuffer(csMain, "InputDataBuffer", inputDataBuffer); computeShader.SetInt("InputDataWidth", deserializedInputTextureData.width); computeShader.SetInt("InputDataHeight", deserializedInputTextureData.height); outputDataBuffer = new ComputeBuffer(inputDataSize, sizeof(float)); computeShader.SetBuffer(csMain, "OutputDataBuffer", outputDataBuffer); Debug.Log("Dispatching [" + groupsX + "," + groupsY + "," + groupsZ + "] groups"); var watch = System.Diagnostics.Stopwatch.StartNew(); computeShader.Dispatch(csMain, groupsX, groupsY, groupsZ); watch.Stop(); outputValuesData = new float[inputDataSize]; outputDataBuffer.GetData(outputValuesData); Debug.Log("Compute Shader Execution Completed. Time elapsed (ns): " + watch.Elapsed.TotalMilliseconds * 1000000); saveOutDataAsJSON(); saveOutDataAsTexture2D(); } void OnDestroy() { if (inputDataBuffer != null) inputDataBuffer.Dispose(); if (outputDataBuffer != null) outputDataBuffer.Dispose(); } }
And this is the compute shader itself:
#pragma enable_d3d11_debug_symbols #pragma kernel CSMain StructuredBuffer InputDataBuffer; uint InputDataWidth; uint InputDataHeight; RWStructuredBuffer OutputDataBuffer; [numthreads(32, 32, 1)] void CSMain(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID, uint groupIndex : SV_GroupIndex, uint3 id : SV_DispatchThreadID) { uint navMeshRes = InputDataWidth * InputDataHeight; // Each thread is mapped to a single "pixel" of the input uint index = id.y * InputDataWidth + id.x; // Check that we are inside the boundaries of the input if(id.x < InputDataWidth && id.y < InputDataHeight) { OutputDataBuffer[index] = 0; float val = InputDataBuffer[index]; uint i = 0, j = 0; float v; if (val == 1) { for(i = 0; i < InputDataWidth; i++) { for(j = 0; j < InputDataHeight; j++) { v = InputDataBuffer[j * InputDataWidth + i]; if (v == 1) { OutputDataBuffer[index] += 1; } } } } } } I know that the code is really unoptimized, redundant and unnecessary, but it is only intended as a test case for understanding why it is crashing in such a manner when the input grows.
I would expect some performance degradation based on a quadratic factor over the input since every thread has to go through the whole array, plus some overhead given by the resources required for managing the threads themselves.
What I don't understand is why it is crashing, and why only when the input reaches a certain size.
Do Compute Buffers have size limits? I would assume that, even if that was the case, they would be able to store way more data than what I am trying to manage.
If you're interested in looking at the complete code I've made an open git repo: [link][1] [1]: https://github.com/MichelangeloDiamanti/Compute-Shader-Tests

Viewing all articles
Browse latest Browse all 287

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>