This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

midgard r20p0 kernel drivers errors

Hi,

I am using r20p0 midgard drivers with kernel 4.14 (rc4) on an odroid xu4 board.

I have enabled DEVFREQ, and have simple_ondemand and performance governors available (default set to performance).

I get this kernel error when mali device is probed:

[    4.492991] mali 11800000.mali: Continuing without Mali regulator control
[    4.503602] mali 11800000.mali: GPU identified as 0x0620 r0p1 status 0
[    4.511482] mali 11800000.mali: Protected mode not available
[    4.518109] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
[    4.525520] in_atomic(): 0, irqs_disabled(): 0, pid: 1, name: swapper/0
[    4.532000] 3 locks held by swapper/0/1:
[    4.535675]  #0:  (&dev->mutex){....}, at: [<c04b69f8>] __driver_attach+0x78/0x120
[    4.543424]  #1:  (&dev->mutex){....}, at: [<c04b6a08>] __driver_attach+0x88/0x120
[    4.550960]  #2:  (rcu_read_lock){....}, at: [<c04af8a8>] kbase_devfreq_init+0x18/0x6e0
[    4.558938] Preemption disabled at:
[    4.559007] [<c0122e58>] irq_enter+0x44/0x88
[    4.566504] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc4-02 #2
[    4.572971] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[    4.579044] [<c01102c8>] (unwind_backtrace) from [<c010cabc>] (show_stack+0x10/0x14)
[    4.586781] [<c010cabc>] (show_stack) from [<c0821484>] (dump_stack+0x98/0xc4)
[    4.593969] [<c0821484>] (dump_stack) from [<c0144bcc>] (___might_sleep+0x264/0x2cc)
[    4.601690] [<c0144bcc>] (___might_sleep) from [<c0837200>] (__mutex_lock+0x2c/0xa38)
[    4.609484] [<c0837200>] (__mutex_lock) from [<c0837c28>] (mutex_lock_nested+0x1c/0x24)
[    4.617459] [<c0837c28>] (mutex_lock_nested) from [<c04c6a88>] (_find_opp_table+0x20/0x5c)
[    4.625692] [<c04c6a88>] (_find_opp_table) from [<c04c6cac>] (dev_pm_opp_get_opp_count+0xc/0x90)
[    4.634451] [<c04c6cac>] (dev_pm_opp_get_opp_count) from [<c04af934>] (kbase_devfreq_init+0xa4/0x6e0)
[    4.643648] [<c04af934>] (kbase_devfreq_init) from [<c049b008>] (kbase_platform_device_probe+0x5ec/0xc98)
[    4.653182] [<c049b008>] (kbase_platform_device_probe) from [<c04b83f4>] (platform_drv_probe+0x4c/0xb0)
[    4.662538] [<c04b83f4>] (platform_drv_probe) from [<c04b67d8>] (driver_probe_device+0x2d0/0x478)
[    4.671370] [<c04b67d8>] (driver_probe_device) from [<c04b6a84>] (__driver_attach+0x104/0x120)
[    4.679944] [<c04b6a84>] (__driver_attach) from [<c04b48a4>] (bus_for_each_dev+0x68/0x9c)
[    4.688083] [<c04b48a4>] (bus_for_each_dev) from [<c04b5a84>] (bus_add_driver+0x1cc/0x264)
[    4.696315] [<c04b5a84>] (bus_add_driver) from [<c04b7400>] (driver_register+0x78/0xf8)
[    4.704284] [<c04b7400>] (driver_register) from [<c0101b64>] (do_one_initcall+0x44/0x170)
[    4.712435] [<c0101b64>] (do_one_initcall) from [<c0c00df4>] (kernel_init_freeable+0x144/0x1d0)
[    4.721105] [<c0c00df4>] (kernel_init_freeable) from [<c0834dd0>] (kernel_init+0x8/0x110)
[    4.729247] [<c0834dd0>] (kernel_init) from [<c01088c8>] (ret_from_fork+0x14/0x2c)
[    4.740536] devfreq devfreq0: Couldn't update frequency transition information.
[    4.752374] mali 11800000.mali: Probed as mali0

Afterwards, if I change devfreq governor to simple_ondemand, I get the same error every second in syslog:

[ 1021.940152] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
[ 1021.947111] in_atomic(): 0, irqs_disabled(): 0, pid: 120, name: kworker/u16:1
[ 1021.954278] INFO: lockdep is turned off.
[ 1021.958110] Preemption disabled at:
[ 1021.958119] [<  (null)>]   (null)
[ 1021.964879] CPU: 3 PID: 120 Comm: kworker/u16:1 Tainted: G        W       4.14.0-rc4-02 #2
[ 1021.973101] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 1021.979175] Workqueue: devfreq_wq devfreq_monitor
[ 1021.983863] [<c01102c8>] (unwind_backtrace) from [<c010cabc>] (show_stack+0x10/0x14)
[ 1021.991568] [<c010cabc>] (show_stack) from [<c0821484>] (dump_stack+0x98/0xc4)
[ 1021.998762] [<c0821484>] (dump_stack) from [<c0144bcc>] (___might_sleep+0x264/0x2cc)
[ 1022.006475] [<c0144bcc>] (___might_sleep) from [<c0837200>] (__mutex_lock+0x2c/0xa38)
[ 1022.014269] [<c0837200>] (__mutex_lock) from [<c0837c28>] (mutex_lock_nested+0x1c/0x24)
[ 1022.022241] [<c0837c28>] (mutex_lock_nested) from [<c04c6a88>] (_find_opp_table+0x20/0x5c)
[ 1022.030472] [<c04c6a88>] (_find_opp_table) from [<c04c6e10>] (dev_pm_opp_find_freq_ceil+0x18/0x64)
[ 1022.039398] [<c04c6e10>] (dev_pm_opp_find_freq_ceil) from [<c0672b28>] (devfreq_recommended_opp+0x34/0x4c)
[ 1022.049018] [<c0672b28>] (devfreq_recommended_opp) from [<c04af4ec>] (kbase_devfreq_target+0x7c/0x408)
[ 1022.058287] [<c04af4ec>] (kbase_devfreq_target) from [<c0671214>] (update_devfreq+0xd4/0x1c4)
[ 1022.066778] [<c0671214>] (update_devfreq) from [<c0671400>] (devfreq_monitor+0x24/0x78)
[ 1022.074752] [<c0671400>] (devfreq_monitor) from [<c013827c>] (process_one_work+0x19c/0x504)
[ 1022.083070] [<c013827c>] (process_one_work) from [<c013861c>] (worker_thread+0x38/0x568)
[ 1022.091132] [<c013861c>] (worker_thread) from [<c013ed04>] (kthread+0x160/0x19c)
[ 1022.098500] [<c013ed04>] (kthread) from [<c01088c8>] (ret_from_fork+0x14/0x2c)
[ 1037.896456] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
[ 1037.903447] in_atomic(): 0, irqs_disabled(): 0, pid: 818, name: bash
[ 1037.909732] INFO: lockdep is turned off.
[ 1037.913667] Preemption disabled at:
[ 1037.913683] [<c0837208>] __mutex_lock+0x34/0xa38
[ 1037.921730] CPU: 4 PID: 818 Comm: bash Tainted: G        W       4.14.0-rc4-02 #2
[ 1037.929143] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 1037.935217] [<c01102c8>] (unwind_backtrace) from [<c010cabc>] (show_stack+0x10/0x14)
[ 1037.942925] [<c010cabc>] (show_stack) from [<c0821484>] (dump_stack+0x98/0xc4)
[ 1037.950118] [<c0821484>] (dump_stack) from [<c0144bcc>] (___might_sleep+0x264/0x2cc)
[ 1037.957829] [<c0144bcc>] (___might_sleep) from [<c0837200>] (__mutex_lock+0x2c/0xa38)
[ 1037.965627] [<c0837200>] (__mutex_lock) from [<c0837c28>] (mutex_lock_nested+0x1c/0x24)
[ 1037.973600] [<c0837c28>] (mutex_lock_nested) from [<c04c6a88>] (_find_opp_table+0x20/0x5c)
[ 1037.981830] [<c04c6a88>] (_find_opp_table) from [<c04c6e10>] (dev_pm_opp_find_freq_ceil+0x18/0x64)
[ 1037.990758] [<c04c6e10>] (dev_pm_opp_find_freq_ceil) from [<c0672b28>] (devfreq_recommended_opp+0x34/0x4c)
[ 1038.000377] [<c0672b28>] (devfreq_recommended_opp) from [<c04af4ec>] (kbase_devfreq_target+0x7c/0x408)
[ 1038.009646] [<c04af4ec>] (kbase_devfreq_target) from [<c0671214>] (update_devfreq+0xd4/0x1c4)
[ 1038.018138] [<c0671214>] (update_devfreq) from [<c067357c>] (devfreq_performance_handler+0x34/0x48)
[ 1038.027150] [<c067357c>] (devfreq_performance_handler) from [<c0672578>] (governor_store+0xe0/0x168)
[ 1038.036250] [<c0672578>] (governor_store) from [<c0284254>] (kernfs_fop_write+0x104/0x208)
[ 1038.044482] [<c0284254>] (kernfs_fop_write) from [<c02144ec>] (__vfs_write+0x1c/0x128)
[ 1038.052367] [<c02144ec>] (__vfs_write) from [<c021476c>] (vfs_write+0xa4/0x168)
[ 1038.059646] [<c021476c>] (vfs_write) from [<c0214930>] (SyS_write+0x3c/0x90)
[ 1038.066666] [<c0214930>] (SyS_write) from [<c0108820>] (ret_fast_syscall+0x0/0x28)

After going back to performance governor, it stops showing up.

Any resolutions?

Thanks.

  • Here's the patch for this issue, for those that want to add r20p0 to kernel 4.14 or above:

    Author: memeka <mihailescu2m@gmail.com>
    Date:   Fri Oct 13 10:25:00 2017 +1030
    
        mali/midgard devfreq: fix for double locking _find_opp_table()
    
        After commit 5b650b388844f26c61c70564865598836d05dcb3, _find_opp_table()
        increments the reference under the opp_table_lock.
        So now there is no need to take the opp_table_lock or rcu_read_lock().
        This patch drops the rcu_read_lock() around _find_opp_table() in the
        mali midgard r20p0 drivers.
    
    diff --git a/drivers/gpu/arm/midgard/backend/gpu/mali_kbase_devfreq.c b/drivers/gpu/arm/midgard/backend/gpu/mali_kbase_devfreq.c
    index d3e800e..2ba96f2 100644
    --- a/drivers/gpu/arm/midgard/backend/gpu/mali_kbase_devfreq.c
    +++ b/drivers/gpu/arm/midgard/backend/gpu/mali_kbase_devfreq.c
    @@ -89,10 +89,8 @@ kbase_devfreq_target(struct device *dev, unsigned long *target_freq, u32 flags)
    
            freq = *target_freq;
    
    -       rcu_read_lock();
            opp = devfreq_recommended_opp(dev, &freq, flags);
            voltage = dev_pm_opp_get_voltage(opp);
    -       rcu_read_unlock();
            if (IS_ERR_OR_NULL(opp)) {
                    dev_err(dev, "Failed to get opp (%ld)\n", PTR_ERR(opp));
                    return PTR_ERR(opp);
    @@ -215,20 +213,16 @@ static int kbase_devfreq_init_freq_table(struct kbase_device *kbdev,
            unsigned long freq;
            struct dev_pm_opp *opp;
    
    -       rcu_read_lock();
            count = dev_pm_opp_get_opp_count(kbdev->dev);
            if (count < 0) {
    -               rcu_read_unlock();
                    return count;
            }
    -       rcu_read_unlock();
    
            dp->freq_table = kmalloc_array(count, sizeof(dp->freq_table[0]),
                                    GFP_KERNEL);
            if (!dp->freq_table)
                    return -ENOMEM;
    
    -       rcu_read_lock();
            for (i = 0, freq = ULONG_MAX; i < count; i++, freq--) {
                    opp = dev_pm_opp_find_freq_floor(kbdev->dev, &freq);
                    if (IS_ERR(opp))
    @@ -236,7 +230,6 @@ static int kbase_devfreq_init_freq_table(struct kbase_device *kbdev,
    
                    dp->freq_table[i] = freq;
            }
    -       rcu_read_unlock();
    
            if (count != i)
                    dev_warn(kbdev->dev, "Unable to enumerate all OPPs (%d!=%d\n",
    

  • Hi ,

    Thanks for the report, I will ask someone from our driver team to have a look at  the issue.

    Regards,

    Daniele