For massive multiple-input multiple-output (MIMO) systems, linear minimum mean-square error (MMSE) detection has been shown to achieve near-optimal performance but suffers from excessively high complexity due to the large-scale matrix inversion. Being matrix inversion free, detection algorithms based on the Gauss-Seidel (GS) method have been proved more efficient than conventional Neumann series expansion (NSE) based ones. In this paper, an efficient GS-based soft-output data detector for massive MIMO and a corresponding VLSI architecture are proposed. To accelerate the convergence of the GS method, a new initial solution is proposed. Several optimizations on the VLSI architecture level are proposed to further reduce the processing latency and area. Our reference implementation results on a Xilinx Virtex-7 XC7VX690T FPGA for a 128 base-station antenna and 8 user massive MIMO system show that our GS-based data detector achieves a throughput of 732 Mb/s with close-to-MMSE error-rate performance. Our implementation results demonstrate that the proposed solution has advantages over existing designs in terms of complexity and efficiency, especially under challenging propagation conditions.