ECTI Transactions on Computer and Information Technology (ECTI-CIT)

Research on Deep Learning-Based Methods for Matching Traffic Sign Images with Textual Captions

2025-07-07T15:43:38+07:00

Intelligent transportation systems face challenges in matching traffic sign images with natural language descriptions, particularly in modal heterogeneity and fine-grained semantic alignment. This issue is crucial for accurately understanding the traffic environment and safety decision-making in autonomous driving, carrying significant application value. Most existing methods are based on classification or template matching and lack deep semantic modelling between images and texts, making it difficult to adapt to real-world complex scenarios. To address this, this paper proposes a deep learning-based image-text matching method that automatically extracts directory structures to generate fine-grained labels, and introduces InfoNCE contrastive loss based on intra-batch negative samples to achieve cross-modal learning. Pré-trained ResNeXt50_32x4d and DistilBERT are employed as image and text encoders, which are uniformly mapped to a shared embedding space. Experimental results demonstrate that the proposed method outperforms existing methods regarding Recall@1, mean Average Precision (mAP), and Mean Reciprocal Rank (MRR), showcasing stronger semantic alignment capability and application potential.

Evaluating Trust in CNN Transfer Learning with Flower Image Classification via Heatmap-Based XAI

2025-06-08T11:04:52+07:00

Convolutional neural networks (CNNs) have demonstrated impressive performance in image classification tasks but are often criticized for their black-box nature, which complicates understanding their decision-making and reliability. Transfer learning with pre-trained CNNs is a widely used approach for tasks with limited data. This study evaluates the performance and explainability of popular CNN models on over image classification using two custom datasets, Flower-8-One and Flower-8-Zoom. Employing Explainable AI (XAI) techniques, such as Grad-CAM, this research visualizes CNN decision-making to uncover its alignment with human perception. A human study assesses trustworthiness by analyzing participants' confidence scores based on model visualizations. Results indicate strong CNN performance but highlight disparities between model-extracted features and human expectations. Among the models evaluated, Xception and Inception-v3 consistently earn higher trust ratings. These findings emphasize the necessity of XAI-driven evaluations to enhance trust and reliability in CNN-integrated systems, particularly in applications requiring human-computer interaction.

Improving the Background Awareness for Identifying Child Begging by Integrating Foreground and Background Dual Image Classifiers

2025-06-20T13:25:33+07:00

Image recognition technologies have found widespread application in domains such as object detection, medical imaging, and autonomous driving. Beyond these applications, they offer promising potential in tackling social challenges, such as identifying children involved in begging activities through intelligent analysis of visual data. The existing data on child begging is scarce and often presents outdated information. Our research demonstrates that image classification models, such as CNN, VGG16, and EfficientNet, can be effectively trained on images captured from public cameras to identify children engaged in begging. This data can be used for quicker and more effective interventions. To further improve detection, we integrated background learning into our approach. The classification models may struggle to distinguish between similar features across environments (for example, misidentifying a poorly dressed child in a ghetto as a beggar). Incorporating background learning can help mitigate such errors by providing contextual understanding. We further proposed an Integrated Dual Image Classifier to learn the background and foreground separately and then subsequently combine both model prediction probabilities. In this method, background understanding is incorporated with the foreground prediction for recognition. The test accuracy results from the integrated dual model approach showed a reduction in false negatives and false positives (Failure to detect actual instances and incorrect identification of false instances as accurate, respectively), with test accuracy above 70%.

Identification of Catheter Ablation Sites Using Patient-Specific CARTO Coordinate Data

2025-03-22T11:26:07+07:00

The global increase in population poses challenges in aging societies. As the number of atrial fibrillation (AF) cases is expected to rise due to this demographic shift, the need for efficient treatment methods is growing. This study aims to partially automate the operation of 3D mapping systems used in AF treatment with the CARTO system. We analyzed X, Y, and Z coordinate data labeled as LPV, RPV, and CTI, which were extracted from the CARTO system. A dataset comprising 10 cases was used for analysis. We defined a reference point at the LPV rooftop and calculated the Euclidean distance to each coordinate. We then compared two datasets: one containing only X, Y, and Z coordinates, and another including both coordinates and distance. First, we visualized the data using principal component analysis (PCA). Next, we evaluated the classification accuracy of four models: k-Nearest Neighbors, Random Forest, SGD Classifier, and Linear SVC. Incorporating distance data reduced the overlap of LPV, RPV, and CTI in the PCA visualization. All classification models showed significant improvements in test and training accuracy, precision, recall, and F1 score when distance data was included.

An Efficient Model for Publishing Microdata with Multiple Sensitive Attributes

2025-07-17T09:54:40+07:00

The purpose of this work is to propose an anonymization model. It is used to address privacy violation issues in datasets that have multiple sensitive attributes. To achieve privacy preservation constraints and maintain data utilities, the sensitive attributes of datasets are grouped to be nominal and continuous attributes. With the nominal sensitive attribute, the data utility and privacy are maintained by the confidence of data re-identification. With another data type, the continuous data, the data utility and privacy are maintained by the data bounding. The proposed model is evaluated by using extensive experiments. The experimental results indicate that the proposed model is more effective and efficient than the compared models. Moreover, the datasets satisfy the privacy preservation constraints of the proposed model, which can guarantee the confidence and bounding of data re-identification.

Leveraging Transfer Learning for Tri-Dhat Classification of Tongue Images in Traditional Thai Medicine

2025-08-07T09:42:01+07:00

Traditional Thai medicine (TTM) is a popular and increasingly accepted treatment option. In TTM, tongue diagnosis is a highly efficient method for assessing overall health, yet its accuracy can vary significantly depending on the practitioner's expertise. In this work, we hypothesize that deep learning-based transfer learning (TL) methods can achieve high accuracy in the Tri-Dhat classification of tongue images, a system that aligns with TTM principles and categorizes the tongue into three types: Vata, Pitta, and Kapha. We propose an approach that uses raw pixel data and artificial intelligence (AI) to support TTM diagnoses. For our analysis, we used a unique dataset of genuine tongue images collected from our university's TTM hospital. To prepare the data, we performed class balancing and data augmentation. We then developed TL techniques with a variety of pretrained deep learning models. For performance comparisons, we utilized two-tailed paired t-tests and single-factor ANOVA. Our experiments showed that the DenseNet121 and Xception models produced the most significant results with cropped image datasets, including both DSLR- and mobile-taken images. Notably, an ensemble of these models yielded the highest average predictions. This ensemble achieved an accuracy of 0.96, a precision of 0.94, an F1 score of 0.96, a sensitivity of 0.96, and a specificity of 0.97. These results were further supported by a p-value of 0.0003 from the ANOVA analysis. We suggest that our methods could be effectively deployed in real-world scenarios to aid TTM practitioners in their diagnoses.

Designing a Two-Stream Network Based Unsupervised Learning for Skin Cancer Recognition

2025-08-07T09:30:57+07:00

Computer vision is crucial in identifying and diagnosing diseases like skin cancer, which can become life-threatening if not detected early. Although numerous methods have been developed, these techniques often face challenges due to the varied nature of skin cancer, which frequently presents irregular shapes and ambiguous structures. In this study, we introduce the design of an unsupervised two-stream network capable of simultaneously learning from datasets of various sizes. The network parameters are organized from smallest to largest to improve the efficiency of feature extraction. Additionally, the network incorporates residual blocks, bidi- rectional long short-term memory, and an attention layer to help reduce training loss. The proposed method was tested using the PAD-UFES-20 dataset, using mean square error to measure training loss and accuracy to check how well it recognizes skin cancer. The results showed a loss of 0.0079 and a training time of only 0.53 minutes, performing better than other advanced methods in both loss and speed. Our approach showed better results than previous methods, accurately recognizing skin cancer and showing potential for use in a mobile app to help with early detection and diagnosis.