DISCOURSE ANALYSIS FOR THE DEVELOPMENT OF A CYBERGROOMING DETECTION MODEL ON ROBLOX
Contenu principal de l'article
Résumé
Cybergrooming represents a growing threat on online gaming platforms such as Roblox, where anonymity and frequent interaction among child users create conditions conducive to child abuse and sexual harassment. The objective of the research that led to this article was to identify linguistic patterns in the discourse of groomers in Spanish-speaking Roblox communities and incorporate them into a computational model for the automatic detection of this cybercrime through text. To this end, a mixed-methods approach was developed, integrating Corpus-Assisted Discourse Studies with the CRISP-DM data mining methodology. A specialized corpus of 25 conversations was compiled and processed, then subjected to detailed analysis. As a main result, a pattern of discursive organization consisting of a sequence of seven conversational modules with specific predictive value and a set of 21 functional lexicogrammatical patterns with 224 associated collocations were identified, described, and subsequently incorporated into a text classification model capable of distinguishing grooming conversations with 93.33% accuracy. In this way, the study demonstrated the efficacy of discourse analysis as a basis for the development of systems for the automatic detection of cybercrimes against minors.
##plugins.themes.bootstrap3.displayStats.downloads##
Renseignements sur l'article
Numéro
Rubrique

Cette œuvre est sous licence Creative Commons Attribution - Pas d'Utilisation Commerciale 4.0 International.
Responsabilité des auteurs-res:
Ils sont responsables des idées et des données présentées dans les manuscrits, de l'exactitude des informations, de la véracité des citations, des droits de publication de tout matériel inclus dans le texte et de la présentation du manuscrit dans le format requis par la Revue (modèle web). Un manuscrit envoyé à CHAKIÑAN ne doit pas avoir été publié auparavant, ni avoir été présenté sous la même forme à un autre support de publication.
Droits d'auteur:
Les articles publiés n’engagent pas le point de vue de la REVUE CHAKIÑAN. La revue est conforme à la politique de la licence Creative Commons Reconocimiento-No comercial 4.0 Internacional (CC BY-NC 4.0). Chaque auteur conserve ses droits sur l'article publié dans Chakiñan.
Déclaration de confidentialité
Les données personnelles et les adresses e-mail saisies dans ce magazine seront utilisées exclusivement aux fins énoncées par la publication et ne seront disponibles à aucune autre fin ou personne.
Comment citer
##plugins.generic.shariff.share##
Références
Allgaier, J., & Pryss, R. (2024). Cross-Validation Visualized: A Narrative Guide to Advanced Methods. Machine Learning and Knowledge Extraction, 6(2), 1378–1388. https://doi.org/10.3390/make6020065
Badman, A., & Kosinski, M. (2024). ¿Qué es un conjunto de datos? IBM. https://www.ibm.com/mx-es/think/topics/dataset
Broome, L., Izura, C., & Davies, J. (2025). An investigation of the linguistic and deceptive characteristics of online grooming types. Journal of Sexual Aggression, 31(3), 378–395. https://doi.org/10.1080/13552600.2023.2300470
Calsamiglia, H., & Tuson, A. (2012). Las cosas del decir. Ariel.
Carville, O., & D’Anastasio, C. (2024, July 22). Roblox Predator Problem Potentially Exposes Kids to Pedophiles. Bloomberg. https://www.bloomberg.com/features/2024-roblox-pedophile-problem/
Chiang, E., & Grant, T. (2019). Deceptive Identity Performance: Offender Moves and Multiple Identities in Online Child Abuse Conversations. Applied Linguistics, 40(4), 675–698. https://doi.org/10.1093/applin/amy007
Daohuan, L., & Xuri, T. (2023). The Contextualized Representation of Collocation. En M. Sun, B. Qin, X. Qiu, J. Jiang, & X. Han (Eds.) The 22nd Chinese National Conference on Computational Linguistics (1st ed., pp. 836–846). ACL Anthology. https://aclanthology.org/2023.ccl-1.71/
Dorsey, E. (2024). Problems at Roblox (RBLX) #4. The Bear Cave. https://thebearcave.substack.com/p/problems-at-roblox-rblx-4
Evans, C., & Lorenzo-Dus, N. (2025). A corpus-assisted discourse analysis of children’s and groomers’ talk in online grooming interactions. Applied Corpus Linguistics, 5(3), 100147. https://doi.org/10.1016/j.acorp.2025.100147
Excelin, G., Dinansyah, F., & Anugrah, C. (2024). Communication Patterns in the Use of Communication Features in Online Games Case Study: Valorant. Jurnal Vokasi Indonesia, 11(2), 45–56. https://doi.org/10.7454/jvi.v11i2.1203
Gillings, M., Learmonth, M., & Mautner, G. (2024). Taking the Road Less Travelled: How Corpus‐Assisted Discourse Studies Can Enrich Qualitative Explorations of Large Textual Datasets. British Journal of Management, 35(4), 1667–1679. https://doi.org/10.1111/1467-8551.12816
González-Sanz, M. (2024). Análisis pragmalingüístico de las secuencias de apertura en conversación de WhatsApp. Rilce Revista de Filología Hispánica, 40(3), 995–1023. https://doi.org/10.15581/008.40.3.995-1023
Halliday, M., & Matthiessen, C. (2004). An Introduction to Functional Grammar. Routledge.
Hunston, S. (2010). How can a corpus be used to explore patterns? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (1st ed., pp. 152–166). Routledge. https://doi.org/10.4324/978003856949
Hunston, S. (2025). Pattern, Construction, System. Cambridge University Press. https://doi.org/10.1017/9781009629065
IBM. (2025). Métricas de evaluación de precisión. IBM. https://www.ibm.com/docs/es/ws-and-kc?topic=metrics-accuracy
Isti’anah, A., Febrina, R., Suhandano, S., Winarti, D., Sutrisno, A., & Jumanto, J. (2023). Big Data, Computer, and Technology in Language Studies: The Potentials of Sketch Engine in Indonesia’s Research. 2023 International Seminar on Application for Technology of Information and Communication (ISemantic), 46–51. https://doi.org/10.1109/iSemantic59612.2023.10295357
Joleby, M., Lunde, C., Landström, S., & Jonsson, L. (2021). Offender strategies for engaging children in online sexual activity. Child Abuse & Neglect, 120, 105214. https://doi.org/10.1016/j.chiabu.2021.105214
Kou, Y., Hernández, R., & Gui, X. (2025). “The System is Made to Inherently Push Child Gambling in my Opinion”: Child Safety, Monetization, and Moderation on Roblox. In N. Yamashita, V. Evers, K. Yatani, X. Ding, B. Lee, M. Chetty, & P. Toups-Dugas (Eds.) CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (1st ed., pp. 1-18). Association for Computing Machinery. https://doi.org/10.1145/3706598.3713170
Lorenzo-Dus, N. (2023). Digital Grooming Discourses of Manipulation in Cyber-crime. Oxford University Press. https://doi.org/10.1093/oso/9780190845193.001.0001
Lorenzo-Dus, N., & Izura, C. (2017). “cause ur special”: Understanding trust and complimenting behaviour in online grooming discourse. Journal of Pragmatics, 112, 68–82. https://doi.org/10.1016/j.pragma.2017.01.004
Lorenzo-Dus, N., Kinzel, A., & Di Cristofaro, M. (2020). The communicative modus operandi of online child sexual groomers: Recurring patterns in their language use. Journal of Pragmatics, 155, 15–27. https://doi.org/10.1016/j.pragma.2019.09.010
Lutzky, U., & Kehoe, A. (2022). Using corpus linguistics to study online data. In C. Vasquez (Ed.), Research methods for digital discourse analysis (2nd ed., pp. 219–236). Bloomsbury Academic.
Lyu, J., & Ishwaran, H. (2023). Commentary: To classify means to choose a threshold. Journal of Thoracic and Cardiovascular Surgery, 165(4), 1443–1445. https://doi.org/10.1016/j.jtcvs.2021.08.009
Manning, C., Raghavan P., & Schütze H. (2009). Introduction to Information Retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
Mariscal, G., Marbán, Ó., & Fernández, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2), 137–166. https://doi.org/10.1017/S0269888910000032
McDonough, M. (2023). Cross-validation. Encyclopedia Britannica. https://www.britannica.com/technology/cross-validation-computer-science
Molina, J., Fàbregues, S., & Escalante, E. (2024). Métodos mixtos de investigación. Ediciones Pirámide.
Morreale, D., & Rosa, A. (2024). Media Technologies and Epistemologies: The Platforming of Everything| Roblox and the Pervasiveness of Play: What Game-Making Communities Can Teach Us About Participatory Practices in Affinity Spaces. International Journal of Communication, 18, 4281–4299. https://ijoc.org/index.php/ijoc/article/view/21902
Ortiz, N. (2024). El grooming como delito trasnacional: Avances y desafíos de la legislación y la cooperación entre Colombia, Argentina y México [Master's Thesis, Universidad Externado de Colombia]. https://bdigital.uexternado.edu.co/handle/001/26312
Partington, A., Duguid, A., & Taylor, C. (2013). Patterns and meanings in discourse: theory and practice in corpus-assisted discourse studies (CADS). John Benjamins Publishing Company.
Pérez, J. M., Furman, D. A., Alonso Alemany, L., & Luque, F. M. (2022). RoBERTuito: A pre-trained language model for social media text in Spanish. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (1st ed., pp. 7235–7243. https://aclanthology.org/2022.lrec-1.785
Pienczykowski, B., & Madella, P. (2026). ‘I was so Afraid All You Wanted from Me Was Sex’: A Corpus-Assisted Study in the Pragmatics of Manipulation in Online Child Sexual Groomers’ Discourse. Corpus Pragmatics, 10, article number 11. https://doi.org/10.1007/s41701-025-00218-0
Pyslar, M. (2025). The Evolution of Community Management in Gaming: From Forums to AI-driven Engagement. Актуальные Исследования, 9(244), 69–73. https://doi.org/10.5281/zenodo.14968205
Red Grooming Latam. (2024). Los riesgos de niñas, niños y adolescentes en internet [[PDF file]. https://www.groominglatam.org/wp-content/uploads/2025/02/INFORME-2024-FLYER-INFORMATIVO.pdf
Rozgonjuk, D., Schivinski, B., Pontes, H. M., & Montag, C. (2023). Problematic Online Behaviors Among Gamers: the Links Between Problematic Gaming, Gambling, Shopping, Pornography Use, and Social Networking. International Journal of Mental Health and Addiction, 21(1), 240–257. https://doi.org/10.1007/s11469-021-00590-3
Schegloff, E. (2007). Sequence organization in interaction: A primer in conversation analysis I. Cambridge University Press.
Schweinberger, M. (2024). Analyzing Collocations and N-grams in R. Language Technology and Data Analysis Laboratory (LADAL). https://slcladal.github.io/coll.html
Sorlin, S. (2017). The pragmatics of manipulation: Exploiting im/politeness theories. Journal of Pragmatics, 121, 132–146. https://doi.org/10.1016/j.pragma.2017.10.002
Steward, K. (2023). Sensitivity vs specificity: Test accuracy explained. Technology Networks. https://www.technologynetworks.com/analysis/articles/sensitivity-vs-specificity-318222
Straka, M., & Straková, J. (2020). UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings. En R. Sprugnoli & M. Passarotti (Eds.), Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages (1st ed., pp. 124–129). European Language Resources Association (ELRA). https://aclanthology.org/2020.lt4hala-1.20/
Ting, K. M. (2010). Confusion Matrix. In G. I. Sammut Claude & Webb (Eds.), Encyclopedia of Machine Learning (1st ed., p. 209). Springer. https://doi.org/10.1007/978-0-387-30164-8_157
Vaamode, G., & González, F. (2008). Clasificación verbal, etiquetación semántica e información lexicográfica en el proyecto ADESSE. En M. Verdejo & A. Serrano (Eds.), Acceso y visibilidad de la información multilingüe en la red: El rol de la semántica (1st ed., pp. 225–236). UNED.
Van Dijk, T. (1997). Discourse as Structure and Process. Sage Publications.
Walsh, M. J. (2022). About ‘face’’: Reconsidering Goffman’s theory of face-work for digital culture.’ In M. Jacobsen & G. Smith (Eds.), The Routledge International Handbook of Goffman Studies (1st ed., pp. 207–218). Routledge. https://doi.org/10.4324/9781003160861-20