Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Older Blog Posts

less than 1 minute read

Published:

For the older blogs, please visit my page at Viblo. I had written all of those blogs while I had been working at Sun Asterisk.

activities

HUST - Volunteer Teaching 2015-2017

Published:

Since I was a freshman, I spent a lot of time running volunteer classes in basic science subject like Advanced Math, Linear Algebra, and Physics for freshmen at Hanoi University of Science and Technology.

Natural Language Processing Course

Published:

I conducted a short-term training program for the “10 Institute” under the Ministry of National Defense. Over one month, I assisted officers in building AI systems tailored to their unique challenges.

honors

CMC AI Contest: Face recognition

AI challenge, CMC Global Company Limited., 2019

At CMC Contest 2019, my team won 2nd and 3rd place on the final leaderboard. Here is a short of my presentation :D.

KSE 2021: Best Paper Award

Conference, KSE: Knowledge and Systems Engineering, 2019

At KSE 2020, we achieved the Best Paper Award Certificate with paper “From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection”.

Emotion Recognition Challenge 2019 (ERC2019)

AI challenge, HCMUS and VinTech City, 2019

Emotion Recognition Challenge 2019 is a contest on audio classification and sentiment analysis based on human voices. ERC2019 is co-organized by Vintech City and Vietnam National University Ho Chi Minh City - University of Science at Hochiminh city.

portfolio

publications

Deep Neural Networks based Invisible Steganography for Audio-into-Image Algorithm

Published in The Global Conference on Consumer Electronics (GCCE 2019), 2019

Steganography is the science of concealing secret information inside usual forms of data. In this paper, the use of deep learning techniques to hide secret audio into the digital images is proposed. Extensive experiments are carried out with a set of 24K images and an audio dataset named VIVOS Corpus. Through experimental results, it has been confirmed that our method is more effective than traditional approaches. The integrity of both image and audio is well preserved while the length of the hidden audio is significantly improved.

Recommended citation: Q. P. Huu, T. H. Dinh, N. N. Tran, T. P. Van and T. T. Minh, "Deep Neural Networks based Invisible Steganography for Audio-into-Image Algorithm," 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE), 2019, pp. 423-427, doi: 10.1109/GCCE46687.2019.9015498. https://ieeexplore.ieee.org/document/9015498

From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection

Published in The International Conference on Knowledge and Systems Engineering (KSE), 2020

Natural language processing (NLP) is a fast-growing field of artificial intelligence. Since the Transformer was introduced by Google in 2017, a large number of language models such as BERT, GPT, and ELMo have been inspired by this architecture. In this paper, we propose a pipeline to adapt the general-purpose RoBERTa language model to a specific text classification task: Vietnamese Hate Speech Detection. Our experiments proved that our proposed pipeline boosts the performance significantly, achieving a new state-of-the-art on Vietnamese Hate Speech Detection (HSD) campaign.

Recommended citation: Q. H. Pham, V. Anh Nguyen, L. B. Doan, N. N. Tran and T. M. Thanh, "From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection," 2020 12th International Conference on Knowledge and Systems Engineering (KSE), 2020, pp. 37-42, doi: 10.1109/KSE50997.2020.9287406. https://ieeexplore.ieee.org/document/9287406

Improving BERT-Based Noisy Text Classification with Knowledge of the Data domain

Published in The 6th Workshop on Noisy User-generated Text (WNUT 2020), 2020

This paper proposes an improved custom model for WNUT task 2: Identification of Informative COVID-19 English Tweet. We improve experiment with the effectiveness of fine-tuning methodologies for state-of-the-art language model RoBERTa. We make a preliminary instantiation of this formal model for the text classification approaches. With appropriate training techniques, our model is able to achieve 0.9218 F1-score on public validation set and the ensemble version settles at top 9 F1-score (0.9005) and top 2 Recall (0.9301) on private test set.

Recommended citation: @inproceedings{doan-bao-etal-2020-sunbear, title = "{S}un{B}ear at {WNUT}-2020 Task 2: Improving {BERT}-Based Noisy Text Classification with Knowledge of the Data domain", author = "Doan Bao, Linh and Nguyen, Viet Anh and Pham Huu, Quang", booktitle = "Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.wnut-1.73", doi = "10.18653/v1/2020.wnut-1.73", pages = "485--490", } https://aclanthology.org/2020.wnut-1.73/

S-NLP at SemEval-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging

Published in The 15th International Workshop on Semantic Evaluation (SemEval 2021), 2021

In this paper, we propose a system to resolve the task 5 in The 15th International Workshop on Semantic Evaluation (SemEva 2021): Toxic Spans Detection. Our method utilizes a pre-trained language model in toxic-domain and combines two approaches Self-training and Feature-based Learning to achieve a high F1-score of 70.77. Finally, we provide insights into the failure of the system and the task’s potential falsely-negative annotations issue with careful error analysis.

Recommended citation: @inproceedings{nguyen-etal-2021-nlp, title = "{S}-{NLP} at {S}em{E}val-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging", author = "Nguyen, Viet Anh and Nguyen, Tam Minh and Quang Dao, Huy and Huu Pham, Quang", booktitle = "Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.semeval-1.120", doi = "10.18653/v1/2021.semeval-1.120", pages = "888--897", } https://aclanthology.org/2021.semeval-1.120/

Exploiting Transfer Learning Models for Reliable Intelligence Identification on Vietnamese Social Network Sites

Published in The 7th International Workshop on Vietnamese Language and Speech Processing, 2021

The rapid growth of social media and misinformation posed challenges for authorities. As a result, determining the reliability of an article has become a crucial task. After various ablation studies, we propose a multi-input model that can effectively leverage both tabular metadata and post content for the task. Applying state-of-the-art fine-tuning techniques for the pre-trained component and training strategies for our complete model, we have achieved a 0.9462 ROC-score on the VLSP private test set.

Recommended citation: @inproceedings{thanh-van-2020-reintel, title = "{R}e{INTEL} Challenge 2020: Exploiting Transfer Learning Models for Reliable Intelligence Identification on {V}ietnamese Social Network Sites", author = "Thanh, Kim Nguyen Thi and Van, Kiet Nguyen", booktitle = "Proceedings of the 7th International Workshop on Vietnamese Language and Speech Processing", month = dec, year = "2020", address = "Hanoi, Vietnam", publisher = "Association for Computational Lingustics", url = "https://aclanthology.org/2020.vlsp-1.9", pages = "45--48", } https://aclanthology.org/2020.vlsp-1.9/

S-NLP at AI City Challenge Task 5 2021: Contrastive Learning for Natural Language-Based Retrieval

Published in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021

AI City Challenge 2021 Task 5: The Natural Language-Based Vehicle Tracking is a Natural Language-based Vehicle Retrieval task, which requires retrieving a single-camera track using a set of three natural language descriptions of the specific targets. In this paper, we present our methods to tackle the difficulties of the provided task. Experiments with our approaches on the competitive dataset from AICity Challenge 2021 show that our techniques achieve Mean Reciprocal Rank score of 0.1701 on the public test dataset and 0.1571 on the private test dataset.

Recommended citation: @InProceedings{Nguyen_2021_CVPR, author = {Nguyen, Tam Minh and Pham, Quang Huu and Doan, Linh Bao and Trinh, Hoang Viet and Nguyen, Viet-Anh and Phan, Viet-Hoang}, title = {Contrastive Learning for Natural Language-Based Vehicle Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {4245-4252} } https://openaccess.thecvf.com/content/CVPR2021W/AICity/html/Nguyen_Contrastive_Learning_for_Natural_Language-Based_Vehicle_Retrieval_CVPRW_2021_paper.html