1. 研究目的与意义(文献综述包含参考文献)
In terms of visual feature expression, deep convolutional neural networks are more resilient than basic convolutional neural networks, and as a result, target identification utilizing deep convolutional neural networks, including two-stage models and one-stage models123, is being continuously developed. Target detection is represented by the R-CNN (Region Conventional Neural Network) 124-27l series, which was developed by Girshick et al. to represent the two-stage process. It's a candidate-regions-based method, as the name implies. The two-stage target detection approach is divided into two steps: the first is to gather a collection of candidate frames, and the second is to perform target classification and position modification based on the candidate frames that have been picked in order to generate more precise findings. Figure 1. 1 R-CNN algorithm flowAs seen in Figure 1.1 represents the R-CNN algorithm flow 24], which begins by selecting approximately 2K candidate regions to be detected from the input picture using Selective Search 128I, followed by using an SVM (Support Vector Machine) classifier to obtain the target category, and finally by adjusting the size of the target box using bounding box regression. Because of its ability to provide superior detection results and to modify the size of the target frame, however, the R-CNN target detector takes a long time to detect and uses a large amount of storage space. It has resulted in the improvement of a large number of algorithms. Consider the work of Ren et al. who developed a more refined version of the R-CNN algorithm, presented the concept of SPP-Net (Spatial Pyramid Pooling Networks) [29' method, and proposed Fast R-CNN1251 and subsequently suggested Fast R-CNN1252 algorithms. When extracting the features of the region of interest from the full image information, the FasterR-CNN1261 algorithm uses the region of interest pooling layer (ROI Pooling Layer) [30I, and instead of using selective search to generate the region to be detected, it uses the region candidate network (RPN) to generate the region to be detected, among other things. He Kaiming et al. presented the R-CNN27 mask in 2017 to perform a variety of tasks such as target recognition, semantic segmentation, instance segmentation, and other similar tasks. For the purpose of compensating for the ROI during the pooling process, ROI Align and bilinear difference are used to fill in the pixels of non-integer locations in the pooling process. The issue occurred as a result of the pooled rounding and zeroing procedure. Figure 1.2 YOLO algorithm flowThe two-stage model has produced acceptable detection results in terms of detection accuracy, but its real-time performance is poor. As a result, some researchers believe that eliminating the region candidate step and focusing just on target location and recognition will solve the problem. 2016 Redmon et al The YOLO (You Only Look Once) method is presented as a one-stage deep learning model. As demonstrated in Figure 1.2, the primary concept is to divide the picture into K x K grids and then utilize these tiny grids as the foundation for target detection and location. If the detected target's center falls within a tiny grid, that grid must be responsible for detecting the target. At the same time, these split tiny grids must forecast numerous bounding boxes (Bounding Boxes) and confidence, as well as many conditional category probabilities, i.e., each bounding box has five parameters: relative The box center (x, y) of the grid cell bounding box, the bounding box width and height (w, h), the number of projected categories C, and the Confidence level. However, because the first-generation YOLO algorithm can only forecast two bounding boxes and one category for each tiny grid division, the rate of missed detection for dense small objects is quite high. The SSD (Single Shot Multibox Detector) [3M! technique, proposed by Liu et al. in the same year, employs the VGG16/351 network as the backbone network and leverages multi-scale feature maps for target recognition via convolution operations.The YOLOv2 algorithm was developed by Redmon and colleagues in order to increase the detection effect. As a result, the Darknet-19 network is utilized as the backbone network, with a BN (Batch Normalization) layer added to each convolutional layer, which not only increases the convergence speed, but also ensures that there are no overfitting phenomena. When compared to YOLOv2, YOLOv3 splits the feature map into 13 x 13 tiny grids, with each grid predicting three bounding boxes, which is a significant improvement. At the same time, the Softmax function is no longer employed when categorizing the target object, and instead, more precise classification is used. Each classifier merely assesses if the target appearing in the target frame corresponds to the current label, which not only enhances the detection efficiency but also allows for multi-label classification to be achieved, as seen in Figure 1. Later algorithms included the YOLOv4 algorithm and the YOLOv54I algorithm. As of now, the YOLO algorithm has undergone several advancements and applications, including use in low-altitude UAV detection and the identification of tiny targets.With the continuing development of target detection algorithms based on deep learning and the widespread use of target detection technology in a variety of industries, target detection technology is also confronted with a number of new obstacles and problems, including the following:(1) How to enhance target identification accuracy while simultaneously improving performance using popular methods. Target detection accuracy is extremely important in practical applications because of the high demands placed on it. Only when the target detection accuracy achieves a specific level of precision can it be extensively applied in real-world situations and at work.(2) The best way to strike a compromise between detection accuracy and detection speed. The assessment indicators for target detection accuracy and detection speed are diametrically opposed in the target detection job. Among other things, the two-stage model R-CNN series has a high detection accuracy, but the one-stage model YOLO series has a quick response time when it comes to detection speed. It is also a significant task to strike a balance between these two indications.(3) How to increase the fluency of real-time detection algorithms that are based on conventional target detection techniques. However, when used directly for video detection, the traditional target detection method has a poor detection impact on the picture, but it has a good effect on the image when used indirectly.Based on the third and fourth generation YOLO algorithms, this paper will focus on the above three problems, improving the network structure of the algorithm to improve the detection accuracy of the detector, and using the rich context information in the video to improve the detection mechanism and optimize the real-time Object detection performance.参考文献[1] Peng Jishen, Sun Lixin, Wang Kai, et al. ED-YOLO power inspection UAV obstacle avoidance target detection algorithm based on model compression [J]. Journal of Instrumentation, 2021, 42(10):10.[2] Liu Xinrou, Li Yang, Song Wenjun. Object detection algorithm in industrial scene based on SlimYOLOv3 [J]. Computer Application Research, 2021.[3] Tang Yue, Wu Ge, Pu Yan. Improved GDT-YOLOV 3 target detection algorithm [J]. Liquid Crystal and Display, 2020, 35(8):9.[4] Ma Linlin, Ma Jianxin, Han Jiafang, et al. Research on target detection algorithm based on YOLOv5s [J]. Computer Knowledge and Technology: Academic Edition, 2021, 17(23):4.[5] Jiang Wenzhi, Li Bingzhen, Gu Jiaojiao, et al. Ship target detection algorithm based on improved YOLO V3 [J]. Electro-Optics and Control, 2021, 28(6):6.[6] Liang Qinjia, Liu Huai, Lu Fei. Research on traffic video target detection algorithm based on improved YOLOv3 model [J]. Journal of Nanjing Normal University: Engineering Technology Edition, 2021, 21(2):7.[7] Sheng Mingwei, Li Jun, Qin Hongde, et al. Ship target detection algorithm based on improved YOLOv3 [J]. Navigation and Control, 2021, 20(2):15.[8] Zhang Taoning. Research on fast target detection algorithm based on improved YOLOv3 model.[9] Chen Jun. Research and implementation of target detection based on YOLOv3 algorithm [D]. University of Electronic Science and Technology of China.[10] Yang Fan. Research on remote sensing image target detection algorithm based on YOLO [D]. Chengdu University of Technology.[11] Tang Songyan. Research and application of aerial target detection algorithm based on YOLOv3 [D]. Huazhong University of Science and Technology.[12] Xu Rong. Research on small target detection algorithm based on YOLOv3 [D]. Nanjing University of Posts and Telecommunications.[13] Sun Jia. Real-time target detection based on improved YOLO algorithm [D]. Shanxi University.[14] Zheng Jiahui. Pedestrian video target detection method based on YOLOv3 [D]. Xidian University, 2019.[15] Chen Jun. Research on fusion detection algorithm of multi-source remote sensing image sea surface target based on R-YOLO [D]. Huazhong University of Science and Technology.
2. 研究的基本内容、问题解决措施及方案
1. Project ProblemThis paper focuses on a convolutional neural network model based on the YOLO algorithm, which has a certain improvement in training speed and detection accuracy compared to the original YOLO algorithm. The main work of this paper is as follows: In terms of improving the detection accuracy of the algorithm, this paper improves the overlapping frame confidence strategy, using two strategies of linear decay function and Gaussian decay function. It is proved by experiments that it can significantly improve the detection accuracy. The dynamic threshold design is adopted, and the output of the density detection convolutional neural network is used to dynamically adjust the size of the threshold. Setting a relatively small threshold in the image location with high density, and setting a relatively large threshold in the low-density location can effectively improve the generality of the neural network algorithm. In terms of improving the training speed of the algorithm, this paper optimizes the loss function of the original YOLO algorithm. By analyzing the objective loss function in the training process of the algorithm, the loss calculation of the width and height of the predicted bounding box is modified, and the loss change rate is used to replace the original change value. At the same time, the network structure of the algorithm is improved, and the operation of batch normalization layer is added to the input of the convolution layer to ensure that each layer of the network has the same distributed input. The Dropout operation that was originally set to prevent over-fitting during network training is deleted, and the design of the full convolution layer is adopted. The characteristics of the convolution layer are used to replace the fully connected layer with redundant parameters. The classification process of features is smoother and more efficient, reducing the number of weight parameters of the network. Modularize the improved target detection algorithm and apply it to the actual object detection system.2. Method of research Research through literature: organize relevant literature, take facts as the basis, and adopt scientific and real research methods. This project uses a variety of ways to collect literature related to this topic, consult a large number of domestic and foreign research literature on the design of target detection algorithm based on YOLO and other related theories, and consult the design of target detection algorithm based on YOLO as the research object. Nearly a hundred papers, data materials, etc. have been analyzed and sorted out. On the basis of comprehensively grasping the domestic and foreign academic research in this field, the analysis of the target detection algorithm design based on YOLO is carried out. Regression training: Aiming at the problem of low training efficiency in the YOLO algorithm, by analyzing its regression training process and the gradient dispersion and overfitting problems that may occur in the training process of deep neural networks, an optimization method to speed up the training efficiency of the algorithm is proposed. And draw conclusions through experimental verification.3. Expected resultsThrough the research on the YOLO network structure, it is found that its existing defects, such as gradient dispersion and training effect fluctuations sometimes occur in the training process; the missed detection rate is relatively high in the detection of dense and small object scenes.In the research direction of deep learning target detection of regression methods, which are based on the existing YOLO algorithm, it is possible to significantly improve the defects of the original algorithm by improving network structure and training strategy, as well as by adjusting and improving the overlapping frame suppression strategy.
课题毕业论文、开题报告、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。