Graphics Processing Units (GPUs) offer tremendous computational and processing power. The architecture requires high communication bandwidth and lower latency between computation units and caches. 3D die-stacking technology is a promising approach to meet such requirements. To the best of our knowledge no other study has investigated the implementation of 3D technology in GPUs. In this paper, we study the impact of stacking caches using the 3D technology on GPU performance. We also investigate the benefits of using 3D stacked MRAM on GPUs. Our work includes cost, power, and thermal analysis of the proposed architectural designs. Our results show a 53% geometric mean performance speedup for iso-cycle time architectures and about 19% for iso-cost architectures.