The DVB-T2 standard for digital terrestrial broadcasting supports the use of quadrature amplitude modulation constellations where the constellation points are rotated in the I-Q plane. This combined with a cyclic delay of the Q component provides improved performance in some fading channels. The complexity of the optimal demapping process for rotated constellations is however significantly higher than for non-rotated constellations. This makes the DVB-T2 demapper one of the most computationally complex parts of a receiver. In this article, we examine possible simplifications of the demapping process suitable for implementation on a general purpose computer containing a modern graphics processing unit (GPU). Furthermore, we measure the performance in terms of throughput, as well as accuracy, of the implemented algorithms. The implementations are designed to interface efficiently to a previously implemented real-time capable GPU-based low-density parity-check channel decoder.