In the last few years, deep learning has attracted wide interest and achieved great success in many computer vision related applications, such as image classification, object detection, object tracking, pose estimation and action recognition. One specific application that can greatly benefit from the recent advance of deep learning is robot vision-based obstacle avoidance. Vision-based obstacle avoidance systems are mostly based on classification algorithms. Most of these algorithms use either color images or depth images as the main source of information. In this paper, the aim is to investigate whether using information extracted from both types of images simultaneously would give better performance than using each one separately. To do this, we chose the convolutional neural network (CNN) as the classifier and HSV-based method to achieve the fusion. We tested this approach using two widely used pre-trained CNN architectures, namely Resnet-50 and GoogLeNet using a dataset locally collected. The results indicate that the image fusion-based classification algorithm achieve a higher accuracy (91.3%) than the one based on depth images (80.4%) but lower than the one based on color images (93.7%). These results can be partly explained by the fact that the used classifiers were pre-trained using color image datasets.