vmovl.u8 Qn, Dn @ convert byte to half wordvmul.u16 Qn, Qn, Qx @ Dx contain 8 * 257
You might get away without the multiply and have a transient register (rather than one containing the constant) by shifting the widened value left and ANDing it with the unshifted version, which might be faster (untested theory) but it is technically one instruction longer, so depends on pipeline (shift and and should be "simple" operations vs a MUL but YMMV).
vmovl.u8 q1, Dnvshll.u8 q2, Dn, #8vand.u16 q1, q1, q2