This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RTX lockup

Hello,

I'd like to report a problem with RTX that has been present in the kernel for a long time - see here:

http://www.keil.com/forum/15211/

I am facing the exact same problem, but the exact series of event required to induce this seems to be rather complex even though it happens rahter often in our application. In essence, RTX 4.12 will lock up in rt_List.c,

void os_put_prio (P_XCB p_CB, P_TCB p_task) {
  /* Put task identified with "p_task" into list ordered by priority.       */
  /* "p_CB" points to head of list; list has always an element at end with  */
  /* a priority less than "p_task->prio".                                   */
  P_TCB p_CB2;
  U32 prio;
  BOOL sem_mbx = __FALSE;

  if (p_CB->cb_type == SCB || p_CB->cb_type == MCB || p_CB->cb_type == MUCB) {
    sem_mbx = __TRUE;
  }
  prio = p_task->prio;
  p_CB2 = p_CB->p_lnk;
  /* Search for an entry in the list */
  while (p_CB2 != NULL && prio <= p_CB2->prio) {
    p_CB = (P_XCB)p_CB2;
    p_CB2 = p_CB2->p_lnk;
  }

because the a linked list element somehow points to itself.
Suspecting it was a timing problem, adding a 10[ms]-50[ms] delay before an operation that seems to indice this (the problem does not occur without it), seems to move the problem to

void os_put_dly (P_TCB p_task, U16 delay) {
  /* Put a task identified with "p_task" into chained delay wait list using */
  /* a delay value of "delay".                                              */
  P_TCB p;
  U32 delta,idelay = delay;

  p = (P_TCB)&os_dly;
  if (p->p_dlnk == NULL) {
    /* Delay list empty */
    delta = 0;
    goto last;
  }
  delta = os_dly.delta_time;
  while (delta < idelay) {
    if (p->p_dlnk == NULL) {
      /* End of list found */
last: p_task->p_dlnk = NULL;
      p->p_dlnk      = p_task;
      p_task->p_blnk = p;
      p->delta_time  = (U16)(idelay - delta);
      p_task->delta_time = 0;
      return;
    }
    p = p->p_dlnk;
    delta += p->delta_time;
  }

I find it hard to believe that this is caused by data corruption by my code as the problem report above is exactly the same, and there is never garbage in the list but an element that points to itself. I don't think I can reproduce this on an evaluation board, since the problem is caused in my cirsumstances by a peripheral that is missing in the Keil boards (an internal SPI bus). I can try to create a test program with similar behavior, but that is going to be very difficult I suspect.
FYI: The problem reported in the link above disappeared spontaniously thus it is still there.
I would very much appreciate if somebody from Keil looked at possible scenarios that could induce this kind of behavior. I will also open a case at ARM.
Has anybody encountered this lately?

0 Tamir Michael over 15 years ago

I have a possible work-around, but I vehemently believe this is indeed a timing problem in the kernel:

while (l_file_size)
{
        l_data_offset += l_bytes_read ; // the offset of the next data to acquire

        // this call returns once the requested page has been fully received
        if ( spi_request_app_data(a_command,
                                                          l_data_offset,
                                                          SPI_FRAME_TIMEOUT) != NO_ERROR)
        {
                g_update_rtc = 0x62 ;

                l_result = ERR_SPI_BAD_STATUS ;

                break ;
        }

        g_update_rtc = 0x62 ;

        // data is appended to the file "R:protocol_data.map"

        lp_spi_data = get_spi_rx_buffer(&l_bytes_read) ;

        l_bytes_read -= 4 ; // file size bytes are not real data, just an indication. ignore.

        // write data to RAM drive
        safe_fwrite(lp_spi_data + 4, // skip file size
                                1,
                                l_bytes_read,
                                lp_handle) ;

        if (safe_ferror(lp_handle) )
        {
                g_update_rtc = 0x62 ;

                l_result = ERR_FILE_WRITE_FAILED ;

                break ;
        }

        l_file_size -= l_bytes_read ; // reamining bytes to read

        os_dly_wait(25) ; // 250[ms] delay
}

I have noticed that once I ask only one block via the SPI bus from the outside world the data pattern on my scope is nice and contains a time delta of 250[ms], and I never experience lookups. So I've added the same delay when addressing the SPI bus to read an entire file (not per block), and it seems to work so far.

0 Tamir Michael over 15 years ago in reply to Tamir Michael

But this seems to cause the controller to enter "undefined" mode from time to time (hanging at "SWI dead")...
Cancel
Vote up 0 Vote down

Cancel