In recent years, high-resolution remote sensing image object detection has attracted increasing interest and become an important research field of computer vision due to its wide applications in civil and military fields, such as environmental monitoring, urban planning, precision agriculture, and land mapping. The natural scene object detection frameworks based on deep learning have made a breakthrough progress. These algorithms have good detection performance on the open data sets of natural scenes. However, although these algorithms have greatly improved the accuracy and speed of remote sensing image object detection, they have not achieved the expected results. Given the large variations of object sizes and inter-class similarity, most of the conventional object detection algorithms designed for natural scene images still face some challenges when directly applied to remote sensing images. To address the above challenges, we propose an end-to-end multi-resolution feature fusion framework for object detection in remote sensing images, which can effectively improve the object detection accuracy. Specifically, we use a Feature Pyramid Network (FPN) to extract multi-scale feature maps. Then, a Multi-resolution Feature Extract (MFE) module, which can promote the network to learn the feature representations of the objects at different resolutions and narrow the semantic gap between different scales, is inserted into the feature layers of different scales. Next, to achieve an effective fusion of multi-resolution features, we use an Adaptive Feature Fusion (AFF) module to obtain more discriminative multi-resolution feature representations. Finally, we use a Dual-scale Feature Deep Fusion (DFDF) module to fuse two adjacent-scale features, which are the output of the adaptive feature fusion module. In the experiments, to demonstrate the effectiveness of each module of our proposed method, including the MFE, AFF, and DFDF modules, we first conducted extensive ablation studies on the large-scale remote sensing image data set DIOR, and the experimental results show that our proposed MFE, AFF, and DFDF modules could improve the average detection accuracy by 1.4%, 0.5%, and 1.3%, respectively, compared with the baseline method. Furthermore, we evaluate our method on two publicly available remote sensing image object detection data sets, namely, DIOR and DOTA, and obtain improvements of 2.5% and 2.2%, respectively, which are measured in terms of mAP comparison with Faster R-CNN with FPN. The detection results of the ablation studies and the comparison experiments indicate that our method can extract more discriminative and powerful feature representations than Faster R-CNN with FPN, which can significantly boost the detection accuracy. Moreover, our method works well for densely arranged and multi-scale objects. Although many improvements have been achieved in this work, some aspects still require improvement. For example, our method performs poorly in terms of detecting objects with big aspect-ratios, such as bridges, possibly because most anchor-based methods have difficulty ensuring a sufficiently high intersection over union rate with the ground-truth objects with big aspect-ratios. Our future work will focus on addressing these problems by exploring the advantages of anchor-free based methods.