Kateryna Kuzma1 and Oleksandr Melnyk2, 1Department of Computer Science, Mykolaiv V. O. Sukhomlynskyi National University, Mykolaiv, Ukraine, 2Department of Economics and Information Technology, Mykolaiv Institute of Human Development of the Higher Education Establishment «Open International University of Human Development «Ukraine», Mykolaiv, Ukraine
The aim is to find a solution for assessment the answers to open-ended questions presented in natural-looking text format, in Ukrainian, using machine learning methods. This problem was considered as a task of binary text classification. The following results were obtained: application of machine learning models for assessment the detailed response in natural-looking text format was researched; use of the logistic regression model for this problem was substantiated; mathematical calculations of the model’s parameters using the functionality of the scikit-learn library were considered; the procedure of text normalization was offered; the method that allows to check whether the word is an abbreviation was developed. The use of two ways (text normalization and Bag of Words & TFIDF model; character n-grams & TFIDF) for text vectorization was argued. The proposed approach can be used for pre-processing of answers to open-ended questions in testing systems in order to determine the relevance of the answer to the content of the discipline.
Natural-Language Processing, Machine Learning, Answer Assessment, Open-Ended Questions, Natural-Looking Text.
Pushya Chaparala1, Satya Sri Pothula2, Ramya Bellamkonda2, 1Assistant Professor, CSE Department, Vignan’s Foundation for Science, Technology and Research (Deemed to be University), Vadlamudi, Guntur, India, 2UG Scholar, CSE Department, Vignan’s Foundation for Science, Technology and Research (Deemed to be University), Vadlamudi, Guntur, India
Network Intrusion Detection is a challenge in a real world, it is a malicious attack that will be occurred on network, to prevent these attacks there are multiple algorithms are existed. Due to increase in usage of networking it had been difficult to classify those malicious attacks from normal networks. Hence there are few machine learning algorithms are introduced among them we need to find the best algorithm technique and also to reduce the complexity we need to choose a feature selection method that is best suitable for the algorithm. We use supervised machine learning technique to identify this attacks through network.
Machine Learning, Networking, Network Intrusion Detection, Intrusion Detection System (IDS), Support Vector Machine (SVM), Artificial Neural Network (ANN), Coefficient correlation, Feature Selection.
Shivam Pant1 and Dr. Narayan Panigrahi2, 1Intern, DRDO, Centre for Artificial Intelligence and Robotics,post graduate student, Gautam Buddha University, India, 2Scientist-G,DRDO,Centre for Artificial Intelligence and Robotics,Bangalore, India
The Voice based command activation has gained momentum as a feature replacing the mouse or pointing device-based menu selection. Number of systems viz. ,software, iot,and search engine use the voice command interface as the primary interaction between the user and the system keeping the conventional User interface as secondary. Voice command interface has muchy pay offs and advantages over conventional menu driven user interface or mouse driven user interface. This paper presents the research landscape on voice command interface and proposes a design ofvoice-based interface for a GIS system further we compare our voice command interface with open source and cots( commercial off-the-shelf) voice command interface models and systems. As a result, the enhanced model attains a significant reduction in the word error rate by 3.49%.
Virtual Assistant, VOS viewer, ASR, computer science, IOT
Tynysbay D. and Serbin V.V, Eurasian Technological University
This article suggests the idea of fitting clothes and accessories based on augmented reality. A datalogical model has been developed taking into account the decision-making module (colors, style, type, material, popularity, etc.) based on personal data (age, gender, weight, height, leg size, hoist length, geolocation, photogrammetry, number of purchases of certain types of clothing, etc.) and statistical data of the purchase history (number of items, price, size, color, style, etc.). Also, in order to provide information to the user, it is planned to develop an augmented reality system using a QR code.This system of selection and fitting of clothing and accessories based on augmented reality will be used in stores to reduce the time for the buyer to make a decision on the choice of clothes.
augmented reality, online store, decision-making module, QR code, clothing store.
Rohan Mistry and Neel Shah and Ravi Patel and Dr. Ramchandra Mangrulkar, Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai, India
COVID-19 pandemic has rapidly affected our day-to-day life, disrupting world trade and movements. Wearing a protective face mask has become a new normal. In the near future, many public service providers will ask the customers to wear masks correctly to avail of their services. Therefore, face mask detection has become a crucial task to help global society. This paper presents a simplified approach to achieve this purpose using some basic Machine Learning packages like TensorFlow, Keras, OpenCV and Scikit-Learn. The proposed method detects the face from the image correctly and then identifies if it has a mask on it or not. As a surveillance task performer, it can also detect a face along with a mask in motion. The method attains accuracy up to 95.77% and 94.58% respectively on two different datasets. We explore optimized values of parameters using the Sequential Convolutional Neural Network model to detect the presence of masks correctly without causing over-fitting.
Image Processing, Deep Learning, Convolutional Neural Networks, Face Detection.
Mohd Suaib*1 and Dr. M Shahid Husain2, 1Dept. of Computer Science Integral University, Lucknow, 2Dept. of Information Technology UTAS, Oman
Ranking Webpages is an important task as it assists the user look for highly ranked pages that are relevant to the query. Different metrics have been proposed to rank web pages according to their quality. With the help of web usage analysis, we can effectively improve the ranking of the web pages according to the user’s requirement. The objective of the proposed work is to provide an efficient framework for personalized web page ranking of search engine based on web usage analysis. The proposed framework is consisting of two modules: in first module; the association rules were generated using frequent patterns (access sequence of pages) for ranking the webpages. In second module; the rules discovered in first module were optimized using Bat Algorithm, an optimizing technique inspired by nature.
Personalized page ranking, Search ranking, web mining, web usage mining, association rule mining, web log analysis.
Mohammad Naveed Hossain, Sheikh Fahim Uz Zaman, Tazria Zerin Khan, Sumiaya Azad Katha, Tawhid Anwar and Dr. Muhammad Iqbal Hossain
When it comes to technology, we live in a time when it has the potential to improve or degrade our lives. We cannot imagine a day without technology in today’s digital world, and for security considerations, we rely heavily on single-factor or two-factor authentication. While Even if we utilize two-factor authentication, our data may still be hacked (2FA). 2FA has several features. Our password contains flaws, and as a result, it may be easily hacked or compromised by hackers, even if our OTP is not available to hackers. To address this vulnerability and enhance the security and dependability of our data, we use three-factor authentication to prevent any unauthorized user from accessing our data. Each of the five authentication steps requires three authentications. The first is the most often used login and password. If both the password and the OTP are legitimate, the system will prompt you for another piece of information: Bio-metric authentication, such as fingerprint or voice recognition, is another alternative, albeit not all devices enable these features. Permits bio-metric identification to be used on specific devices. The option of using a graphical password is available. By encrypting our data, we can assure that it is secure and trustworthy for all of our users by utilizing these three authentication methods.
OTP, Authentication, 2FA, 3FA, Hacked, Bio-Metric, alphanumeric password, data protection, network security, three-factor authentication.
Awritrojit Banerjee1* and Aruna Chakraborty2, 1Department of Information Technology, St. Thomas’ College of Engineering & Technology, Kolkata, India, 2Department of Basic Science and Humanities, St. Thomas’ College of Engineering & Technology, Kolkata, India
The prevalent machine learning algorithms used for classification require extensive data pre-processing and a lot of training to figure out the best values for the learnable parameters that maximize the prediction capability of the model. This makes such algorithms slow and often, numerically unstable. Seldom, we find classifiers that can not only correctly classify points actually belonging to the dataset, but can also identify points that do not. In this paper, we propose a fuzzy algorithm to solve these problems. The proposed algorithm is highly scalable, is not affected by the curse of dimensionality and an also identify points not belonging to the dataset with a certainty of 1. This fuzzy-logic-oriented approach utilizes the power of membership functions to build classifiers that achieved 100% accuracy on the Iris dataset and 93.85% accuracy on the high dimensional Wisconsin Breast Cancer dataset at thresholds greater than 0.65.
Fuzzy Logic, Pattern Classification, High dimensional dataset, Unknown pattern identification, Gaussian membership function.
Gleb Kiselev1,2*, Daniil Weizenfeld1,2 and Yaroslava Gorbunova1,2, 1Department of Information Technology, Peoples’ Friendship University of Russia, Miklukho-Maklaya str. 6, Moscow, 117198, Russia, 2Artificial Intelligence Research Institute, FRC CSC RAS, Vavilova str. 44, Moscow, 119333, Russia
The paper considers the automatic analysis problem of a user’s natural language query from an image. The mechanism synthesizes a logically correct non-binary response. Synthesis is carried out on the basis of combining the results of convolutional and recurrent networks and projection on a set of valid answers. A three-dimensional data set has been developed to search for an answer in a complex environment using a robotic arm. Similar systems examples and their comparison are given. The experiments results showed that our method is able to achieve indicators comparable with known models.
computer science, machine learning, computer vision, neural networks.
JIA Xiaoyun, WANG Kai, WANG Erhu and WU Jingyi, School of Electrical Information and Artificial Intelligence, Shaanxi University of Science & Technology Xian, China
On the basis of recognizing the human daily activities, the activity transition recognition is proposed in this paper. Eliminate the influence of phone direction by calculating the vertical and horizontal components in acceleration, and extract the time domain features from vertical and horizontal component such as mean, standard deviation, zero crossing rate and so on. The transition motion was detected by the DTW algorithm. Finally using Random Forest to identify nine behaviors on the WEKA platform, the average accuracy reached 96.70%. The average accuracy of transition action reached 92.10%.
activity recognition, smartphone sensors,acceleration, transition motion, DTW algorithm.
E.Vani and P.Prabhavathy, Department of CSE, SRM Institute of Science and Technology, Chennai, India
Information technology is rapidly growing in todays environment. Maintaining security and privacy during a cyber attack is a key concern. The amount of new malware is rapidly increasing, according to studies. Between the worlds of attack and security against hazardous software, its a neverending circle. Malicious malware signatures are always being developed by antivirus vendors, and attackers are constantly attempting to circumvent such signatures. To examine the essential properties of sample deep learning architectures used in cyber security applications, as well as emerging deep learning trends and resources and also identify the limitations of the evaluated works and present a picture of the areas current concerns, providing useful insights and best practices for researchers and developers working on similar issues.
Cybersecurity, Privacy, Malware, Deep Learning.
Qi Zhong, Shichang Gao and Bo Yi, Department of Computer Science and Engineering, Northeastern University, Shenyang, Liaoning, China
With the advent of the era of big data, data is endowed with higher potential value. However, new challenges are also brought to data security, especially for the sensitive data in industrial environment. Now, with the developement of indutrial internet, enterprises begin to connect each other, under which a slight carelessness may lead to the leakage of sensitive data, which will bring inestimable losses to enterprises. Hence, sensitive data classification is required as a secure way to avoid such situation. This paper presents a sensitive data classification method based on an improved ID3 decision algorithm. Firstly, we introduce the idea of attribute weighting to optimize the basic structure of traditional ID3. Secondly, we use the weighted information gain to select nodes during tree construction, which improves multi-value bias defect compared with the traditional algorithm. Experimental results show that we can achieve branching accuracy up to 97.38%.
Sensitive data, Data classification, ID3 decision tree, Industrial environment.
Carlo Petalver, Roderick Bandalan and Gregg Victor Gabison, Graduate School of Computer Studies, University of San Jose – Recoletos, Cebu City, Philippines
Categorizing books and other archaic paper sources to course reference or syllabus is a challenge in library science. The traditional way of categorization is manually done by professionals and the process of seeking and retrieving information can be frustrating. It needs intellectual tasks and conceptual analysis of a human effort to recognize similarities of items in determining the subject to the correct category. Unlike the traditional categorization process, the author implemented the concept of automatic document categorization for libraries using text mining. The project involves the creation of a web app and mobile app. This can be accomplished through the use of a supervised machine learning classification model using the Support Vector Machine algorithm that can predict the given category of data from the book or other archaic paper sources to the course syllabus they belong to.
Text Mining, Document Categorization, Classification algorithm, Support Vector Machine, Library.
Mishahira.N1, Mohammad Talal Houkan1, Kishor Kumar Sadasivuni1*, Mithra Geetha1, Somaya Al-Maadeed2, Asiya Albusaidi3, Nandhini Subramanian2, Huseyin Cagatay Yalcin4, Hassen M. Ouakad3, Issam Bahadur5, 1Center for Advanced Materials Qatar University, Doha, Qatar, 2Department of Computer Science and Engineering, Qatar University, Qatar, 3Mechanical and Industrial Engineering Department, Sultan Qaboos University, Muscat, Oman, 4Biomedical Research Center, Qatar University, Qatar, 5College of Engineering, Sultan Qaboos University, Muscat, Oman
Globally, cardiovascular problems are the leading cause of death. The early identification of heart failure will help patients and healthcare practitioners take better measures to avoid risks. The purpose of this study is to identify a method that can accurately predict the risk of cardiovascular diseases. These predictions are made by deep learning algorithms, such as Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), using the training data we provide. Insufficient medical data will decrease prediction accuracy. As part of our study, we analyzed DNN architectures to predict heart failure. Existing deep learning algorithms were used over the training data. Comparing the accuracy performance of the existing Model over the proposed Model leads to achieving a new deep learning algorithm that can predict heart failure from RR interval measurements. NSR-RR and CHF-RR databases from Physiobank were used for obtaining the results. The proposed Model achieved 94% accuracy than the existing model accuracy of 93.1% and was based on the experimental results using these two open-source RR interval databases.
Heart Failure, Deep learning, Time series, Time-LeNet. Database.
Om Mane and Sarvanakumar Kandasamy, Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
The stock market is a network which provides a platform for almost all major economic transactions. While investing in the stock market is a good idea, investing in individual stocks may not be, especially for the casual investor. Smart stock-picking requires in-depth research and plenty of dedication. Predicting this stock value offers enormous arbitrage profit opportunities. This attractiveness of finding a solution has prompted researchers to find a way past problems like volatility, seasonality, and dependence on time. This paper surveys recent literature in the domain of natural language processing and machine learning techniques used to predict stock market movements. The main contributions of this paper include the sophisticated categorizations of many recent articles and the illustration of the recent trends of research in stock market prediction and its related areas.
Stock market Prediction, Sentiment Analysis, Opinion Mining, Natural Language Processing, Deep Learning.
Achraf Lassoued, University Paris 2, France
Given a stream of text, we associate a stream of edges in a graph G and study its large clusters by analysing the giant components of random subgraphs, obtained by sampling some edges with different distributions. For a stream of Tweets, we show that the large giant components of uniform sampled edges of the Twitter graph reflect the large clusters of G. For a stream of text, the uniform sampling is inefficient but the weighted sampling where the weight is proportional to the Word2vec similarity provides good results. Nodes of high degree of the giant components define the central words and central sentences of the text.
Streaming algorithms, Clustering, Dynamic graphs.
Ojaswi Binnani, Language Technologies Research Centre, International Institute of Information Technology-Hyderabad, India
The Internet has many useful resources with bountiful information at our fingertips. However, there are nefarious uses to this resource, and can be misused in cybercrime, fake emails, stealing content, plagiarism etc. In many cases, the text is anonymously written, and it is important to accurately find the author to bring the criminal to justice. The topic of author identification helps with this task, where from a set of suspect authors, the writer of a given text will be determined. We aim to create a computationally non-complex model that works to find the author of a given text. The model will not require as much data as deep learning methods. This paper focuses on the use of various stylometric and word-based features as well as different machine learning models to create a classifier that gives the best accuracy. We find that the XGBoosting algorithm performs this task with a good accuracy.
Author Identification, Forensic Linguistics, Machine Learning.
Ojaswi Binnani, Language Technologies Research Centre, International Institute of Information Technology-Hyderabad, India
Sentiment Analysis is the basis of many text analysis tasks such as detection of emotion, hate-speech or sarcasm. It can be useful in other areas such as racism/sexism detection and stock prediction. In these tasks, the sentiment of the text can be a powerful feature with which to improve accuracy. However, while training a model for another task, it is not ideal to train another model from scratch to compute the sentiment. Rather, using a pre-trained sentiment model such as VADER, Flair, Stanza, etc, can be much faster, requires no training data and gives an accurate output. In this paper, we aim to find the most accurate pre-trained sentiment model on a dataset of news articles. News articles are non-polarised texts, which are the opposite of what most of these pre-trained sentiment models are trained on. This will be a good indicator of how sensitive the model is and how accurate it can be for any type of text. We find that the rule-based model VADER performs better than a distilBERT model Flair and a lexical model TextBlob. We also find a correlation between the sentiment of the first sentence of an article and the sentiment of the whole document.
Sentiment Analysis, Pre-trained models, BERT.
Priya K P, Harikrishnan T P, Datahub Technologies and R& D, Kochi, Ernakulam, Kerala, India
Question Answering (QA) is a branch of the Natural Language Understanding (NLU) field (which falls under the NLP umbrella). It aims to implement systems that, when given a question in natural language, can extract relevant information from provided data and present it in the form of natural language answer. The problem of making a fully functional question answering system is a problem which has been quite popular among researchers. Information Extraction systems takes text in natural language as input and produces structured information specified by a certain criterion, which is relevant to that particular use case. This paper introduces Information Extraction technology, its various sub-tasks focusing on question answering, highlights state-of-the-art research in variousIE subtasks, current challenges, and future research directions.
Natural language processing(NLP), Information Retrieval(IR), Information Extraction(IE), Question Answering(QA).
Sanusi Bashir Adewale1, Ogunshile Emmanuel1, Aydin Mehmet1, Olabiyisi Stephen Olatunde2 and Oyediran Mayowa3, 1Department of Computer Science and Creative Technologies, University of the West of England, Bristol, United Kingdom, 2Department of Computer Science, Ladoke Akintola University of Technology, Ogbomoso, Nigeria, 3Department of Computer Sciences, Ajayi Crowther University, Oyo, Nigeria
The improvement of this paper takes advantage of the existing formal method called Stream X-Machine by optimizing the theory and applying it to practice in a large-scale system. This optimized formal approach called Communicating Stream X-Machine (CSXM) applied in software testing based on its formal specifications to a distributed system as it points out its advantages and limits of the use of the existing formal methods to this level. However, despite the tremendous works that has been done in the software testing research area, the origin of bugs or defects in a software is still cost and takes more time to detect. Therefore, this paper has proven that the current state of art challenge is due to that lack of a formal specification of what exactly a software system is supposed to do. In this paper, CSXM principles was used for the development of Automated Teller Machine (ATM) given formal specification which outputs conforms with the implementation. Moreso, the computational strength of Remote Method Invocation (RMI) network interface in Java programming was used to provide communication between the stand-alone systems i.e., the client (ATM) and server (Bank) in the context of this paper. The results of this paper have been proven and helps software developers and researchers takes early action on bugs or defects discovered by software testing.
Formal Method, Software Testing, Stream X-machine, Communicating Stream X-Machine, Software Testing, Distributed System, Formal Specification, Defects, Automated Teller Machine, Remote Method Invocation, Java Programming Language.
Bonil Shah, P. M. Jat and Kalyan Sasidhar, DAIICT, Gandhinagar, India
The growth of big-data sectors such as the Internet of Things (IoT) generates enormous volumes of data. As IoT devices generate a vast volume of time-series data, the Time Series Database (TSDB) popularity has grown alongside the rise of IoT. Time series databases are developed to manage and analyze huge amounts of time series data. However, it is not easy to choose the best one from them. The most popular benchmarks compare the performance of different databases to each other but use random or synthetic data that applies to only one domain. As a result, these benchmarks may not always accurately represent real-world performance. It is required to comprehensively compare the performance of time series databases with real datasets. The experiment shows significant performance differences for data injection time and query execution time when comparing real and synthetic datasets. The results are reported and analyzed.
Timeseries database, benchmark, real-world application.
Sadarabalaji, Ambati Naga Praneetha Reddy, L Raghav Kalyan, Kuruba Madhu, Nikhath Tabassum, Ashwini P, Geetha D.D, School of Electronics and Communication Engineering, REVA University, Bangalore, India
The pandemic has brought a paradigm shift in the wireless communication technology. Due to social distance norms, wireless technologies have emerged much stronger and better in the consumer sector. Our proposed project, Smart Cosmetic Selector (SCS), brings this touch less technology to the cosmetic industry. In the proposed project, we have built a smart cosmetic selection unit that helps a person to choose lipstick shades without applying on the lips. The image of the person is captured by a high resolution camera in real time. The user can choose the lipstick shades and see the color on their lips in real time. The open computer vision and Haar cascade files that analyze facial characteristics and can recommend the best fit lip colors based on individual’s complexion. The proposed method is much better than the existing library detection method in terms of efficiency, memory and speed.
Haar cascade, Lip color, Open Cv, segmentation, facial recognition and Image processing.
Ritika Rattan and Jeba Shiney O, Department of Electronics&Communication Engineering, Chandigarh University, Mohali, Punjab, India
High crop yield is an important feature that impacts the field of agribusiness and farmers fi-nancially, socially, and in every perspective. At different stages of a crops growth, it is important to keep a close eye on it so that early infections can be found. Manual examination of the crops cannot help because humans are always prone to errors, and the risk of false predictions is high, and it is very time consuming as well. Also, when the paradigm shift is towards smart agriculture these years, automation of every aspect of crop growing and monitoring is important. Therefore, in this work, we have analyzed the appropriateness of automated approach for the classification of diseases in plants. Two algorithms one based on a CNN, and other based on VGG-16 approach has been compared and analyzed in conditions of precision and loss. The performance has been verified with the plant village dataset and the precision given by first trained model of CNN (convolution neural network) was 96.77% and the VGG based trained model with batch normalization is 94%.
Convolution neural network (CNN), plant disease detection, village plant disease dataset.
B.Rahul, K.Kuppusamy, A. Senthilrajan, Department of Computational Logistics, Alagappa University, Karaikudi, TamilNadu, India
Nowadays, the protection of digital data on the internet is a very challenging role. The intruders are sharpening their strategies and procedures to interrupt into web and steal information. So, a strong encryption scheme is required to protect the data stored and transferred on the internet. This paper proposes a novel multi-chaotic text data encryption using the biometric image and the SHA-256 hash algorithm. The proposed system scrambles the text data using the chaotic values generated from the chaotic maps such as the logistic map, Henon map, and Lorenz systems. SHA-256 hash values of the Biometric image and the plain text data are used to initialize the chaotic maps. The proposed system undergoes various security tests to analyze its strength against various attacks.
Chaotic maps, SHA-256 hash algorithm, biometric image, text data encryption.
Pushya Chaparala, SatyaSri Pothula, Ramya Bellamkonda, Vignan’s Foundation for Science, Technology and Researc, Deemed to be University, Vadlamudi, Guntur, India
Network Intrusion Detectionis a challenge in a real world, itis a malicious attack that will be occurred on network ,to prevent these attack s there are multiple algorithms are existed. Due to increase in usage of network ing it had been difficult to classify those malicious attack sfrom normal network s. Hence there are few machine learning algorithms are introduced among them we need to find the best algorithm technique and also to reduce the complexity we need to choose a feature selection method that is best suitable for the algorithm. We use supervised machine learning technique to identify this attack s through network.
Machine Learning, Network ing, Network Intrusion Detection, Intrusion Detection System(IDS), Support Vector Machine(SVM), Artificial Neural Network(ANN), Coefficient correlation, Feature Selection..