Cortex A9 dual core - How to achieve an AMP system without an RTOS?

One of my customer is considering to use Cortex A9 dual core device for a computational intensive task (For the sake of discussion, lets assume an high end

image analysis task). Due to cost or other over head reasons, he/she does not prefer to use an RTOS instead the customer intend to use some APIs

provided by the device vendor to access the essential h/w blocks and peripherals of the device. This is a sort of AMP system in the sense that some core tasks

are "hard glued" to specific CA9 cores to take full advantage of the CA9 dual core architecture. I understand that using a light weight multi core supported RTOS will be

the elegant approach. But, if the end user/customer wants to go with a non-OS approach, what are the benefits and pains he/she is going to experience in this project?

Inter-core communication without an RTOS going to be a painful experience for them .. Any other thing you can share with me on this?

Cheers,

Senthil