Transactions on Machine Intelligence

Transactions on Machine Intelligence

Finding the Potential Accepted Answer on Stack Overflow: a Text Mining Approach

Document Type : Original Article

Authors
1 Faculty of Informatics, Università della Svizzera Italiana, Lugano, Switzerland
2 Department of Computer Engineering, Salman Farsi University of Kazerun, Taleghani, Kazerun, 73175-457, Fars, Iran
Abstract
Stack Overflow serves as a widely-used, community-driven platform where developers seek assistance with programming-related issues. While the platform allows users to post questions and receive multiple answers, a significant portion of these questions do not culminate in an accepted solution. This lack of a clearly identified best answer often results in confusion for both the original poster and future visitors, as well as increased time spent navigating through numerous responses. To address this challenge, we present a method for automatically identifying the most promising answer among unaccepted ones. Our approach involves the application of text mining techniques to extract 13 informative features from a large dataset comprising 15,464 questions, 37,275 answers, and 72,025 comments. These features capture various textual, structural, and user-related aspects of the posts. The extracted data are then used to train machine learning models aimed at predicting the answer most likely to be accepted. The study focuses solely on English-language content available on Stack Overflow. The proposed method demonstrates promising performance, achieving an overall accuracy of 71% and an F1 score of 70%. These results suggest that automated answer recommendation can significantly enhance the user experience by reducing ambiguity and improving the efficiency of information retrieval on Q&A platforms.
Keywords

  • Faisal, M. S., et al. (2019). Expert ranking techniques for online rated forums. Computers in Human Behavior, 100, 168–176. https://doi.org/10.1016/j.chb.2018.06.013
  • Anderson, A., et al. (2012). Discovering value from community activity on focused question answering sites: A case study of Stack Overflow. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 850–858). https://doi.org/10.1145/2339530.2339665
  • Begel, A., et al. (2013). Social networking meets software development: Perspectives from GitHub, MSDN, Stack Exchange, and TopCoder. IEEE Software, 30(1), 52–66. https://doi.org/10.1109/MS.2013.13
  • Singh, V., et al. (2009). Users of open source software—How do they get help? In Proceedings of the 42nd Hawaii International Conference on System Sciences (pp. 1–10). IEEE. https://doi.org/10.1109/HICSS.2009.259
  • Storey, M.-A., et al. (2010). The impact of social media on software engineering practices and tools. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research (pp. 359–364). https://doi.org/10.1145/1882362.1882435
  • Vasilescu, B., et al. (2014). How social Q&A sites are changing knowledge sharing in open source software communities. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 342–354). https://doi.org/10.1145/2531602.2531659
  • Parnin, C., et al. (2012). Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow. Georgia Institute of Technology, Tech. Rep, 11.
  • Mamykina, L., et al. (2011). Design lessons from the fastest Q&A site in the west. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2857–2866). https://doi.org/10.1145/1978942.1979366
  • Deterding, S., et al. (2011). Gamification: Using game-design elements in non-gaming contexts. In CHI'11 Extended Abstracts on Human Factors in Computing Systems (pp. 2425–2428). https://doi.org/10.1145/1979742.1979575
  • Capiluppi, A., et al. (2012). Assessing technical candidates on the social web. IEEE Software, 30(1), 45–51. https://doi.org/10.1109/MS.2012.169
  • Naghashzadeh, M., et al. (2021). How do users answer MATLAB questions on Q&A sites? A case study on Stack Overflow and MathWorks. In Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 559–563). IEEE. https://doi.org/10.1109/SANER50967.2021.00059
  • Pundge, A. M., et al. (2016). Question answering system, approaches and techniques: A review. International Journal of Computer Applications, 141(3), 1–8. https://doi.org/10.5120/ijca2016909587
  • Yazdaninia, M., et al. (2021). Characterization and prediction of questions without accepted answers on Stack Overflow. In Proceedings of the 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC) (pp. 1–11). IEEE. https://doi.org/10.1109/ICPC52881.2021.00015
  • Diyanati, A., et al. (2020). A proposed approach to determining expertise level of Stack Overflow programmers based on mining of user comments. Journal of Computer Languages, 61, 101000. https://doi.org/10.1016/j.cola.2020.101000
  • Pan, Y., & Zhang, J. Q. (2011). Born unequal: A study of the helpfulness of user-generated product reviews. Journal of Retailing, 87(4), 598–612. https://doi.org/10.1016/j.jretai.2011.05.002
  • Calefato, F., et al. (2018). Sentiment polarity detection for software development. In Proceedings of the 40th International Conference on Software Engineering (pp. 1–12). https://doi.org/10.1145/3180155.3182519
  • Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 168–177). https://doi.org/10.1145/1014052.1014073
  • Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of the 19th National Conference on Artificial Intelligence (pp. 755–760). AAAI Press.
  • Fellbaum, C. (1998). WordNet: An electronic lexical database. MIT Press. https://doi.org/10.7551/mitpress/7287.001.0001
  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748
  • Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (pp. 133–138). https://doi.org/10.3115/981732.981751
Volume 4, Issue 4
Autumn 2021
Pages 238-244

  • Receive Date 17 June 2021
  • Revise Date 28 August 2021
  • Accept Date 23 December 2021