We hypothesized that word count would certainly be correlated positively with essay score. In particular, we calculated the number of words and the vocabulary sizes (number of unique words), for each essay in the training set, plotting them against the provided essay scores. With the information we needed in place, we tested a few essay features at a basic level to get a better grasp on the data’s format as well as to investigate the sorts of features that might prove useful in predicting an essay’s score. Ultimately, then, the three crucial pieces of information were the essay, the essay set to which it belonged, and the overall essay score. We decided to take the average of the overall provided scores as our notion of “score” for each essay. For example, the score was often broken down by scorer, and at times into subcategories. The training and validation sets did have plenty of information that we deemed to be extraneous to the scope of our project. To handle these special characters, we used ISO-8859-1 text encoding, which eliminated encoding-related errors. A very small number of essays contained special characters that could not be processed in unicode (the most popular method of text encoding for English). In fact, the only complication to arise from collecting the data was a rather sneaky one, only discovered in the later stages when we attempted to spell check the essays. Furthermore, the data was clearly laid out in both txt and csv formats that made importing them into a Pandas DataFrame a relatively simple process. On the bright side, the essay sets were complete-that is, there was no missing data. Each essay set had a unique topic and scoring system, which certainly complicated fitting a model, given the diverse data. The data is comprised of eight separate essay sets, consisting of a training set of 12,976 essays and a validation set of 4,218 essays. It was unnecessary for us to collect any data, as the essays were provided by the Hewlett Foundation. While superior auto-graders that have resulted from years of extensive research surely exist, we feel that our final project demonstrates our ability to apply the data science process learned in this course to a complex, real-world problem. In this project, which stems from an existing Kaggle competition sponsored by the William and Flora Hewlett Foundation, we have attempted to provide an efficient, automated solution to essay grading, thereby eliminating grader bias, as well as expediting a tedious and time-consuming job. Such a model would take typical features of strong essays into account, analyzing each essay for the existence of these features. There has been much research into creating AI agents, ultimately based on statistical models, that can automatically grade essays and therefore reduce or even eliminate the potential for bias. With these and other issues taken into consideration, the problem of essay grading is clearly a field ripe for a more systematic, unbiased method of rating written work. Instructors might be more inclined to better reward essays with a particular voice or writing style, or even a specific position on the essay prompt. Indeed, the grading of essays is often a topic of controversy, due to its intrinsic subjectivity. Furthermore, essay grading can be plagued by inconsistencies in determining what a “good” essay really is. Of course, manual essay grading for a classroom of students is a time-consuming process, and can even become tedious at times. One of the main responsibilities of teachers and professors in the humanities is grading students essays. CS109a Final Project: Automated Essay Grading Automated Essay Grading A CS109a Final Project by Anmol Gupta, Annie Hwang, Paul Lisker, and Kevin Loughlin View on GitHub Download.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |