During exploratory performance testing, software testers evaluate the performance of a software system with different input combinations in order to identify combinations that cause performance problems in the system under test. Performance problems such as low throughput, high response times, hangs, or crashes in software applications have an adverse effect on the customer’s satisfaction. Since many of today’s large-scale, complex software systems (e.g., eCommerce applications, databases, web servers) exhibit very large multi-dimensional input spaces with many input parameters and large ranges, it has become costly and inefficient to explore all possible combinations of inputs in order to detect performance problems. In order to address this issue, we introduce a method for identifying input combinations that trigger performance problems in the software system under test. Our method, under the name of iPerfXRL, employs deep reinforcement learning in order to explore a given large multi-dimensional input space efficiently. The main benefit of the approach is that, during the exploration process, it learns and recognizes the problematic regions of the input space that have a higher chance of triggering performance problems. It concentrates the search in those problematic regions to find as many input combinations as possible that can trigger performance problems while executing a limited number of input combinations against the system. In addition, our approach does not require prior domain knowledge or access to the source code of the system. Therefore, it can be applied to any software system where we can interactively execute different input combinations while monitoring their performance impact on the system. We implement iPerfXRL on top of the Soft Actor-Critic algorithm. We evaluate empirically the efficiency and effectiveness of our approach against alternative state-of-the-art approaches. Our results show that iPerfXRL accurately identifies the problematic regions of the input space and finds up to 9 times more input combinations that trigger performance problems on the system under test than the alternative approaches.