BIT Has Achieved New Progress in Object Tracking Based on Deep Semantic Features

Recently, Xu Tingfa's scientific research team of School of Optics and Photonics, Beijing Institute of Technology has made new progress in the twin network target tracking of deep semantic features. The related research results, entitled " SiamATL: Online Update of Siamese Tracking Network via Attentional Transfer Learning ", are published in IEEE Transactions on Cybernetics (IEEE Transactions on cybernetics), a top international journal in the field of artificial intelligence TCYB) (if = 11.079). IEEE TCYB is one of the most influential international academic journals in the field of artificial intelligence. In 2020, IEEE TCYB ranks in the forefront of more than 120 JCR journals in this field, with an impact factor of 11.079, classified as JCR Area I. It mainly publishes and reports the latest research progress and technology in the fields of computational intelligence, artificial intelligence, data science and neural network. The first author of this work is Huang Bo, a doctoral student of Beijing Institute of Technology, and the corresponding author is Professor Xu Tingfa of Beijing Institute of Technology.

With the development of artificial intelligence, visual target tracking with deep semantic features has attracted a lot of attention in computer vision. Especially, twin networks based on decision similarity assessment are widely used in the field of tracking. However, the online updating of twin tracking network has some limitations, that is, it is difficult to achieve the balance between model adaptation and degradation.

Aiming at this scientific problem, Professor Xu Tingfa's team of Beijing Institute of Technology first proposed a twin tracking model based on attentional transfer learning.


Figure 1 Twin tracking model of attentional transfer learning

In order to make full use of the previous information, the model transfers the knowledge of feature representation, learning of filter knowledge and spatiotemporal attention knowledge to the current template updating process. From the feature representation from the historical tracking tasks, it is intended to solve the problem of lack of high-quality training data in current tracking tasks. An instance transfer discriminant correlation filter is introduced to enhance the decision-making ability of twin networks. A Gaussian like matrix based on spatiotemporal relationship is pre-defined to control the learning weights of different spatial positions, and L2 loss function is used to calculate the updated target template.


Figure 2 Comparison of traditional update process and attention update process

In Basketball sequence, the traditional method has a low learning rate, the "ghost" of the original target and background still exists in the updated target appearance, and seriously affects the detection accuracy of the current frame. In the Lemming sequence, the traditional method shows high learning rate, and the severe occlusion of the target gradually degenerates the traditional model, which eventually leads to the complete failure of the updated template. Therefore, it is difficult to strike a balance between model adaptation and degradation and single learning rate, and attention learning method can solve this problem well.


Figure 3 Visualization analysis of spatial weight G

In the G matrix, the center target area is given a higher weight, while the boundary background area is given a lower weight. This attention learning strategy can introduce more background information into the updated twin template branch without polluting the central target area.


Figure 4 Twin tracking results of attentional transfer learning

This study provides a new idea for the update design of twin networks. The proposed attention transfer learning strategy can be used as a general module in most twin trackers and improve their performance.

A brief introduction to the first author:

Huang Bo, a 2016 master and doctoral candidate of School of Optics and Photonics, Beijing Institute of Technology, studied under Professor Xu Tingfa, and his research direction is computer vision and deep learning. He has published 17 academic papers, including 10 SCI papers, with a cumulative total impact factor of 57.221. Nine papers were published as the first author, six of which were published in IEEE TCYB, IEEE TMM, PR, Neurocomputing and other high-level SCI journals, with a cumulative impact factor of 36.152. He applied for one Chinese invention patent and three software copyrights. At the same time, he has served as the reviewer of IEEE TCSVT, Neurocomputing and other SCI journals for many times. He has won the second prize of optoelectronic design competition, the third prize of mathematical modeling for postgraduates, the first prize of capital "Challenge Cup", the special prize of “Century Cup”, the first prize of electronic design competition of “Baike Rongchuang Cup”, the second prize of Beijing Electronic Design Competition, outstanding postgraduates, outstanding League Cadres and other honors.

A brief introduction to the corresponding author:

Xu Tingfa, Professor, doctoral tutor, responsible professor of national first-class key discipline "Optical Engineering", deputy director of Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of education. In recent years, he led his research team to deepen the research on photoelectric imaging detection and recognition, hyperspectral computing imaging processing and other directions. He has presided over and undertaken more than 30 major scientific research instrument development projects of National Nature Science Foundation of China. He has published more than 120 academic papers in a series of international and domestic journals, among which more than 80 have been included in SCI / EI. He has applied for 40 national invention patents as the first inventor, 15 of which have been authorized and publicized.

Details of the paper: Bo Huang, Tingfa Xu, Ziyi Shen, Shenwang Jiang, Bingqing Zhao, and Ziyang Bian. SiamATL: Online Update of Siamese Tracking Network via Attentional Transfer Learning. IEEE Transactions on Cybernetics, 2020, DOI: 10.1109/TCYB.2020.3043520

Link to the paper: