SARAIKI LANGUAGE CORPUS FOR SENTIMENT ANALYSIS
Keywords:
Sentiment Analysis, Linguistic, Corpus, Support Vector, ApproachesAbstract
This study uses a large corpus created especially for this purpose to investigate sentiment analysis of the Saraiki language. The corpus represents the linguistic and cultural diversity of the Saraiki-speaking community by incorporating a wide range of Saraiki text data that was gathered from different sources. Using this dataset, we investigate the effectiveness of many machine learning algorithms for sentiment analysis tasks, such as Support Vector Classification (SVC), Naive Bayes, Random Forest, Logistic Regression, and Convolutional Neural Network (CNN). The Saraiki language corpus is used to train and assess each algorithm, and the results are compared based on performance criteria like accuracy, recall, and precision. The experimental findings indicate that the algorithms are not all equally successful; CNN performs better than Naive Bayes, Logistic Regression, Random Forest, and SVC. In addition, we provide a thorough examination of each algorithm's advantages and disadvantages when it comes to performing sentiment analysis tasks in the Saraiki language context. Our results highlight how crucial it is to use domain-specific corpora and cutting-edge machine learning approaches to accurately analyze sentiment in languages with limited resources, such as Saraiki. By developing sentiment analysis techniques specific to the Saraiki language, this research advances our knowledge of the sentiment dynamics within the Saraiki- speaking population.