This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Receiving GPU fence timeouts and GPU crashes on Android 5.1.1. Mali-400 MP OpenGL ES 2.0 Android Rockchip 3126 processor. EGL implementation 1.4 Linux-r6p0-01rel1

Good Day,

I know this is questions is about older technology, and this may totally be the wrong forum, but I have no place else to turn. Hopefully one of the experts here can point me in the correct direction. We have developed a kiosk like app on a generic Android tablet that is used for point of sale operations. The application works without issue on an existing tablet running Android 4.1. However due to parts being deprecated, we have been forced to upgrade to a new version of the tablet and such a new version of Android. However after running the existing application on the new tablet, the graphical subsystem crashes after some period of operation. This period can be as short as 20 minutes or longer than a week. The only way to recover from this state is to power cycle the unit. When the crash happens, the application is not executing anything that should be taxing the system. Basically the app is started up and defaults into its basic mode of operation of displaying a text marketing message or a PNG logo splash screen which alternates once every 15 seconds, displaying a digital clock that updates once every second at the center bottom of the screen, and displaying static location information (name of establishment, etc) in bottom right hand corner. No animation, no streaming, etc. Just a very basic kiosk like application. 

What we see in the log (see snippet below) every 30 seconds or so, are fence timeout messages, then a listing of objects, then the GPU failure to restart message. Once this happens the entire graphics subsystem is non-functioning, although we can still connect to the unit via the Android debugging bridge - which is how we retrieve the logs. If anyone could point us in the right direction on how to resolve this issue, it would be greatly appreciated. Thank you in advance for any assistance.

Cordially,

Dale

========

<<< KERNEL LOG SNIPPET >>>>

<6>[259628.544422] fence timeout on [d8534880] after 500ms

.

.

<4>[259568.463295] objs:

<4>[259568.463295] --------------

<4>[259568.463295] fb-timeline sw_sync: 286483

<4>[259568.463295]   pt signaled@24.203036: 255

<4>[259568.463295]   pt signaled@2925.669034: 3516

<4>[259568.463295]   pt signaled@243109.259577: 264000

<4>[259568.463295]   pt signaled@263766.516201: 286460

<4>[259568.463295]   pt signaled@263787.033239: 286483

<4>[259568.463295]   pt signaled@263787.033240: 286483

<4>[259568.463295]   pt active: 286484

<4>[259568.463295]   pt active: 286484

<4>[259568.463295]

<4>[259568.463295] mali-170-gp Mali: oldest (286484) next (286484)

<4>[259568.463295]

<4>[259568.463295]

<4>[259568.463295] mali-170-pp Mali: oldest (286484) next (286484)

.

.

.

<4>[259627.137394] Mali: Executor GP: Job 1146132 Timeout on Mali_GP

<4>[259627.137485] Mali: Dump Group Mali_GP
<4>[259627.137535] Mali: 0x0000: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.137608] Mali: 0x0010: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.137681] Mali: 0x0020: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.137753] Mali: 0x0030: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.137825] Mali: 0x0040: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.137897] Mali: 0x0050: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.137969] Mali: 0x0060: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138041] Mali: 0x0070: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138113] Mali: 0x0080: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138184] Mali: 0x0090: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138257] Mali: 0x00a0: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138329] Mali: Dump Group MMU
<4>[259627.138375] Mali: 0x0000: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138448] Mali: 0x0010: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.138520] Mali: 0x0020: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
<4>[259627.324242]
<4>[259627.325415]
<4>[259627.325486]
<4>[259627.362487] Mali: ERR: drivers/gpu/arm/mali400/mali/common/mali_gp.c
<4>[259627.362580] mali_gp_hard_reset() 140
<4>[259627.362580] Mali GP: The hard reset loop didn't work, unable to recover
<4>[259627.362660]
<4>[259627.521234]
<4>[259627.522242]
<4>[259627.522313]
<4>[259627.551750] Mali: ERR: drivers/gpu/arm/mali400/mali/common/mali_mmu.c
<4>[259627.551782] mali_mmu_raw_reset() 279
<4>[259627.551782] Reset request failed, MMU status is 0xFFFFFFFF

Parents
  • Hi Dale,
    I think there are two possible reason for this issue:
    1)The GPU works abnormally, which may be caused by the wrong customized GPU power switch, and according to your log, the Dump Group registers are all returning 0xFFFFFFFF, so that is why I suggest to keep the GPU power as always_on, then check is this issue can be fixed. If not, I suggest to report the issue to Rockchip, which can help check if this reason is the root cause.

    2 ) This issue also may be caused by one DDK feature (the Dirty Bit Optimization), which is used to remove the readback comand by recording the location of the readback command, then output only modified pixels in each GPU tile to overwrite the traditional readback, for some corner cases, it can break the GPU command and cause the GPU hung by a wrong GPU command. It is hard to forbid the app to trigger Dirty Bit Optimization, but this bug has already been fixed after r7p0, maybe you can ask Rockchip to check if you can update the Mali driver into r7p0.
    I do not know if these can help you, please feel free to let me know if you have any other question.

    Brs,
    Luffy

Reply
  • Hi Dale,
    I think there are two possible reason for this issue:
    1)The GPU works abnormally, which may be caused by the wrong customized GPU power switch, and according to your log, the Dump Group registers are all returning 0xFFFFFFFF, so that is why I suggest to keep the GPU power as always_on, then check is this issue can be fixed. If not, I suggest to report the issue to Rockchip, which can help check if this reason is the root cause.

    2 ) This issue also may be caused by one DDK feature (the Dirty Bit Optimization), which is used to remove the readback comand by recording the location of the readback command, then output only modified pixels in each GPU tile to overwrite the traditional readback, for some corner cases, it can break the GPU command and cause the GPU hung by a wrong GPU command. It is hard to forbid the app to trigger Dirty Bit Optimization, but this bug has already been fixed after r7p0, maybe you can ask Rockchip to check if you can update the Mali driver into r7p0.
    I do not know if these can help you, please feel free to let me know if you have any other question.

    Brs,
    Luffy

Children
  • Luffy,
    Thank you again for your response. For item one, I can test that here without issue. I can setup the system to have the GPU always on and run the test. This will take a few days, maybe a week to confirm as the crash happens sporadically. For item 2, I have already contacted the vendor to see how we can include the updated driver. With the information you have provided, I can ensure that we received at least version r7p0. I will keep you posted as to our progress over the next couple of weeks. Thank you again for your team&#x27;s support, it has been invaluable.

    Best regards,
    Dale
    =====