The OS will configure it for you early in its boot process. It will adjust it often whilst it is running, but if you're writing a typical application you won't need to know about what it is doing.However, if you're hoping to use DMA, you will need to know the physical address of your memory, but typically only drivers can do that (depending on the OS).Any mode other than user mode is considered privileged, so in theory you can configure the MMU from any other mode. However, system mode is probably the most suitable mode for normal OS execution.
Yes, there will be an overhead. If your memory block is 128B long, I'd use memcpy. If your memory block is 40MB, DMA will be faster. However, there are many factors involved so it really depends on what you're doing and how you're doing it. If in doubt (and you have the time), try doing both and running some benchmarks.