Attention-Based Image Captioning Using DenseNet Features

Md Zakir Hossain; Ferdous Sohel; Mohd Fairuz Shiratuddin; Hamid Laga; Mohammed Bennamoun

doi:10.1007/978-3-030-36802-9_13

Attention-Based Image Captioning Using DenseNet Features

Md Zakir Hossain^*, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga, Mohammed Bennamoun

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Citations (Scopus)

Abstract

We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods.

Original language	English
Title of host publication	Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
Editors	Tom Gedeon, Kok Wai Wong, Minho Lee
Publisher	Springer
Pages	109-117
Number of pages	9
ISBN (Print)	9783030368012
DOIs	https://doi.org/10.1007/978-3-030-36802-9_13
Publication status	Published - 2019
Externally published	Yes
Event	26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, Australia Duration: Dec 12 2019 → Dec 15 2019

Publication series

Name	Communications in Computer and Information Science
Volume	1143 CCIS
ISSN (Print)	1865-0929
ISSN (Electronic)	1865-0937

Conference

Conference	26th International Conference on Neural Information Processing, ICONIP 2019
Country/Territory	Australia
City	Sydney
Period	12/12/19 → 12/15/19

Keywords

Attention
DenseNet
Image captioning

ASJC Scopus subject areas

General Computer Science
General Mathematics

Access to Document

10.1007/978-3-030-36802-9_13

Cite this

Hossain, M. Z., Sohel, F., Shiratuddin, M. F., Laga, H., & Bennamoun, M. (2019). Attention-Based Image Captioning Using DenseNet Features. In T. Gedeon, K. W. Wong, & M. Lee (Eds.), Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings (pp. 109-117). (Communications in Computer and Information Science; Vol. 1143 CCIS). Springer. https://doi.org/10.1007/978-3-030-36802-9_13

Attention-Based Image Captioning Using DenseNet Features. / Hossain, Md Zakir; Sohel, Ferdous; Shiratuddin, Mohd Fairuz et al.
Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings. ed. / Tom Gedeon; Kok Wai Wong; Minho Lee. Springer, 2019. p. 109-117 (Communications in Computer and Information Science; Vol. 1143 CCIS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hossain, MZ, Sohel, F, Shiratuddin, MF, Laga, H & Bennamoun, M 2019, Attention-Based Image Captioning Using DenseNet Features. in T Gedeon, KW Wong & M Lee (eds), Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings. Communications in Computer and Information Science, vol. 1143 CCIS, Springer, pp. 109-117, 26th International Conference on Neural Information Processing, ICONIP 2019, Sydney, Australia, 12/12/19. https://doi.org/10.1007/978-3-030-36802-9_13

@inproceedings{45ac980ef64a4e86b8a94c8d0d30829f,

title = "Attention-Based Image Captioning Using DenseNet Features",

abstract = "We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods.",

keywords = "Attention, DenseNet, Image captioning",

author = "Hossain, {Md Zakir} and Ferdous Sohel and Shiratuddin, {Mohd Fairuz} and Hamid Laga and Mohammed Bennamoun",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2019.; 26th International Conference on Neural Information Processing, ICONIP 2019 ; Conference date: 12-12-2019 Through 15-12-2019",

year = "2019",

doi = "10.1007/978-3-030-36802-9_13",

language = "English",

isbn = "9783030368012",

series = "Communications in Computer and Information Science",

publisher = "Springer",

pages = "109--117",

editor = "Tom Gedeon and Wong, {Kok Wai} and Minho Lee",

booktitle = "Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings",

}

TY - GEN

T1 - Attention-Based Image Captioning Using DenseNet Features

AU - Hossain, Md Zakir

AU - Sohel, Ferdous

AU - Shiratuddin, Mohd Fairuz

AU - Laga, Hamid

AU - Bennamoun, Mohammed

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2019.

PY - 2019

Y1 - 2019

N2 - We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods.

AB - We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods.

KW - Attention

KW - DenseNet

KW - Image captioning

UR - http://www.scopus.com/inward/record.url?scp=85078455156&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85078455156&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-36802-9_13

DO - 10.1007/978-3-030-36802-9_13

M3 - Conference contribution

AN - SCOPUS:85078455156

SN - 9783030368012

T3 - Communications in Computer and Information Science

SP - 109

EP - 117

BT - Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings

A2 - Gedeon, Tom

A2 - Wong, Kok Wai

A2 - Lee, Minho

PB - Springer

T2 - 26th International Conference on Neural Information Processing, ICONIP 2019

Y2 - 12 December 2019 through 15 December 2019

ER -

Attention-Based Image Captioning Using DenseNet Features

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this