DISCOURSE ANALYSIS FOR THE DEVELOPMENT OF A CYBERGROOMING DETECTION MODEL ON ROBLOX

Ana Paola Castañón Marroquín; Brenda Ailed Rodríguez Colis; Andrea Bazán Durán; Luis Enrique Colmenares-Guillen

PDF HTML (Spanish) EPUB (Spanish)

Published: 2026-05-28

Keywords:

Análisis del discurso, delito informático, abuso de menores, videojuego, lingüística computacional

Ana Paola Castañón Marroquín

Benemérita Universidad Autónoma de Puebla, Puebla, México

https://orcid.org/0009-0007-3411-0418

Brenda Ailed Rodríguez Colis

Benemérita Universidad Autónoma de Puebla, Puebla, México

https://orcid.org/0009-0002-2694-6542

Andrea Bazán Durán

Benemérita Universidad Autónoma de Puebla, Puebla, México

https://orcid.org/0009-0001-7722-1878

Luis Enrique Colmenares-Guillen

BUAP

https://orcid.org/0000-0002-9921-8813

Abstract

El cibergrooming representa una amenaza creciente en plataformas de juego en línea como Roblox, donde el anonimato y la frecuente interacción de usuarios infantiles generan condiciones propicias para el acoso sexual y abuso de menores. El objetivo de la investigación que originó este artículo fue identificar patrones lingüísticos en el discurso de groomers de comunidades hispanohablantes de Roblox e incorporarlos a un modelo computacional para la detección automática de este delito informático a través del texto. Para ello se construyó una metodología de enfoque mixto que integró los Estudios del Discurso Asistidos por Corpus y la metodología de minería de datos CRISP-DM. Se compiló y procesó un corpus especializado de 25 conversaciones que fue sometido a un análisis pormenorizado. Como resultado principal, se delimitó y describió un patrón de organización discursiva constituido por una secuencia de siete módulos conversacionales con un valor predictivo determinado y 21 patrones léxico-gramaticales funcionales con 224 colocaciones asociadas, elementos integrados a un modelo de clasificación textual capaz de distinguir conversaciones de grooming con un 93.33% de exactitud. De esta forma, el estudio demostró la efectividad del análisis discursivo como base para el desarrollo de sistemas de detección automática de delitos informáticos contra menores.

Downloads

Download data is not yet available.

Issue

Chakiñan PREPRINT PAPERS

Section

RESEARCH ARTICLES

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Responsibility of the authors:
The authors are responsible for the ideas and data collected in the manuscripts. They are additionally accountable for the fidelity of the information, the correction of the citations, the right to publish any material included in the text, and the presentation of the manuscript in the format required by the Journal (WORD template). A manuscript forwarded to CHAKIÑAN must not have been published before, nor must it have been submitted to another means of publication.

Copyright:
Published articles do not necessarily compromise the viewpoint of the CHAKIÑAN JOURNAL. The Journal is aligned to the policy of the licence de Creative Commons Reconocimiento-No comercial 4.0 Internacional (CC BY-NC 4.0). Each author retains the right to the paper published in the Chakiñan journal.

Privacy statement

The personal data and email addresses entered in this magazine will be used exclusively for the purposes stated by the publication and will not be available for any other purpose or person.

Author Biographies

Ana Paola Castañón Marroquín, Benemérita Universidad Autónoma de Puebla, Puebla, México

Facultad de Filosofía y Letras, Licenciatura en Lingüística y Literatura Hispánica

Brenda Ailed Rodríguez Colis, Benemérita Universidad Autónoma de Puebla, Puebla, México

Facultad de Ciencias de la Computación, Ingeniería en Ciencias de la Computación

Andrea Bazán Durán, Benemérita Universidad Autónoma de Puebla, Puebla, México

Facultad de Ciencias de la Computación, Ingeniería en Ciencias de la Computación

Luis Enrique Colmenares-Guillen, BUAP

Luis Enrique Colmenares Guillén,
He is a lecturer and researcher in the Facultad de Ciencias de la Computación de la Benemérita Universidad Autónoma de Puebla en México. He has research and academic experience
In the Faculty, He has given lectures the courses of Operating Systems, Projects Administration, Distributed Systems, Digital Image Processing, Real-Time Systems, Projects I+D.
At present he has developed algorithms and classification systems for the area artificial intelligence and pattern recognition.

How to Cite

Castañón Marroquín, A. P., Rodríguez Colis, B. A., Bazán Durán, A., & Colmenares-Guillen, L. E. (2026). ANÁLISIS DEL DISCURSO PARA EL DESARROLLO DE UN MODELO DE DETECCIÓN DE CIBERGROOMING EN ROBLOX. CHAKIÑAN, Journal of Social Sciences and Humanities, 13-35. https://chakinan.unach.edu.ec/index.php/chakinan/article/view/1512

Share

References

Allgaier, J., & Pryss, R. (2024). Cross-Validation Visualized: A Narrative Guide to Advanced Methods. Machine Learning and Knowledge Extraction, 6(2), 1378–1388. https://doi.org/10.3390/make6020065

Badman, A., & Kosinski, M. (2024). ¿Qué es un conjunto de datos? IBM. https://www.ibm.com/mx-es/think/topics/dataset

Broome, L., Izura, C., & Davies, J. (2025). An investigation of the linguistic and deceptive characteristics of online grooming types. Journal of Sexual Aggression, 31(3), 378–395. https://doi.org/10.1080/13552600.2023.2300470

Calsamiglia, H., & Tuson, A. (2012). Las cosas del decir. Ariel.

Carville, O., & D’Anastasio, C. (2024, July 22). Roblox Predator Problem Potentially Exposes Kids to Pedophiles. Bloomberg. https://www.bloomberg.com/features/2024-roblox-pedophile-problem/

Chiang, E., & Grant, T. (2019). Deceptive Identity Performance: Offender Moves and Multiple Identities in Online Child Abuse Conversations. Applied Linguistics, 40(4), 675–698. https://doi.org/10.1093/applin/amy007

Daohuan, L., & Xuri, T. (2023). The Contextualized Representation of Collocation. En M. Sun, B. Qin, X. Qiu, J. Jiang, & X. Han (Eds.) The 22nd Chinese National Conference on Computational Linguistics (1st ed., pp. 836–846). ACL Anthology. https://aclanthology.org/2023.ccl-1.71/

Dorsey, E. (2024). Problems at Roblox (RBLX) #4. The Bear Cave. https://thebearcave.substack.com/p/problems-at-roblox-rblx-4

Evans, C., & Lorenzo-Dus, N. (2025). A corpus-assisted discourse analysis of children’s and groomers’ talk in online grooming interactions. Applied Corpus Linguistics, 5(3), 100147. https://doi.org/10.1016/j.acorp.2025.100147

Excelin, G., Dinansyah, F., & Anugrah, C. (2024). Communication Patterns in the Use of Communication Features in Online Games Case Study: Valorant. Jurnal Vokasi Indonesia, 11(2), 45–56. https://doi.org/10.7454/jvi.v11i2.1203

Gillings, M., Learmonth, M., & Mautner, G. (2024). Taking the Road Less Travelled: How Corpus‐Assisted Discourse Studies Can Enrich Qualitative Explorations of Large Textual Datasets. British Journal of Management, 35(4), 1667–1679. https://doi.org/10.1111/1467-8551.12816

González-Sanz, M. (2024). Análisis pragmalingüístico de las secuencias de apertura en conversación de WhatsApp. Rilce Revista de Filología Hispánica, 40(3), 995–1023. https://doi.org/10.15581/008.40.3.995-1023

Halliday, M., & Matthiessen, C. (2004). An Introduction to Functional Grammar. Routledge.

Hunston, S. (2010). How can a corpus be used to explore patterns? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (1st ed., pp. 152–166). Routledge. https://doi.org/10.4324/978003856949

Hunston, S. (2025). Pattern, Construction, System. Cambridge University Press. https://doi.org/10.1017/9781009629065

IBM. (2025). Métricas de evaluación de precisión. IBM. https://www.ibm.com/docs/es/ws-and-kc?topic=metrics-accuracy

Isti’anah, A., Febrina, R., Suhandano, S., Winarti, D., Sutrisno, A., & Jumanto, J. (2023). Big Data, Computer, and Technology in Language Studies: The Potentials of Sketch Engine in Indonesia’s Research. 2023 International Seminar on Application for Technology of Information and Communication (ISemantic), 46–51. https://doi.org/10.1109/iSemantic59612.2023.10295357

Joleby, M., Lunde, C., Landström, S., & Jonsson, L. (2021). Offender strategies for engaging children in online sexual activity. Child Abuse & Neglect, 120, 105214. https://doi.org/10.1016/j.chiabu.2021.105214

Kou, Y., Hernández, R., & Gui, X. (2025). “The System is Made to Inherently Push Child Gambling in my Opinion”: Child Safety, Monetization, and Moderation on Roblox. In N. Yamashita, V. Evers, K. Yatani, X. Ding, B. Lee, M. Chetty, & P. Toups-Dugas (Eds.) CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (1st ed., pp. 1-18). Association for Computing Machinery. https://doi.org/10.1145/3706598.3713170

Lorenzo-Dus, N. (2023). Digital Grooming Discourses of Manipulation in Cyber-crime. Oxford University Press. https://doi.org/10.1093/oso/9780190845193.001.0001

Lorenzo-Dus, N., & Izura, C. (2017). “cause ur special”: Understanding trust and complimenting behaviour in online grooming discourse. Journal of Pragmatics, 112, 68–82. https://doi.org/10.1016/j.pragma.2017.01.004

Lorenzo-Dus, N., Kinzel, A., & Di Cristofaro, M. (2020). The communicative modus operandi of online child sexual groomers: Recurring patterns in their language use. Journal of Pragmatics, 155, 15–27. https://doi.org/10.1016/j.pragma.2019.09.010

Lutzky, U., & Kehoe, A. (2022). Using corpus linguistics to study online data. In C. Vasquez (Ed.), Research methods for digital discourse analysis (2nd ed., pp. 219–236). Bloomsbury Academic.

Lyu, J., & Ishwaran, H. (2023). Commentary: To classify means to choose a threshold. Journal of Thoracic and Cardiovascular Surgery, 165(4), 1443–1445. https://doi.org/10.1016/j.jtcvs.2021.08.009

Manning, C., Raghavan P., & Schütze H. (2009). Introduction to Information Retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf

Mariscal, G., Marbán, Ó., & Fernández, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2), 137–166. https://doi.org/10.1017/S0269888910000032

McDonough, M. (2023). Cross-validation. Encyclopedia Britannica. https://www.britannica.com/technology/cross-validation-computer-science

Molina, J., Fàbregues, S., & Escalante, E. (2024). Métodos mixtos de investigación. Ediciones Pirámide.

Morreale, D., & Rosa, A. (2024). Media Technologies and Epistemologies: The Platforming of Everything| Roblox and the Pervasiveness of Play: What Game-Making Communities Can Teach Us About Participatory Practices in Affinity Spaces. International Journal of Communication, 18, 4281–4299. https://ijoc.org/index.php/ijoc/article/view/21902

Ortiz, N. (2024). El grooming como delito trasnacional: Avances y desafíos de la legislación y la cooperación entre Colombia, Argentina y México [Master's Thesis, Universidad Externado de Colombia]. https://bdigital.uexternado.edu.co/handle/001/26312

Partington, A., Duguid, A., & Taylor, C. (2013). Patterns and meanings in discourse: theory and practice in corpus-assisted discourse studies (CADS). John Benjamins Publishing Company.

Pérez, J. M., Furman, D. A., Alonso Alemany, L., & Luque, F. M. (2022). RoBERTuito: A pre-trained language model for social media text in Spanish. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (1st ed., pp. 7235–7243. https://aclanthology.org/2022.lrec-1.785

Pienczykowski, B., & Madella, P. (2026). ‘I was so Afraid All You Wanted from Me Was Sex’: A Corpus-Assisted Study in the Pragmatics of Manipulation in Online Child Sexual Groomers’ Discourse. Corpus Pragmatics, 10, article number 11. https://doi.org/10.1007/s41701-025-00218-0

Pyslar, M. (2025). The Evolution of Community Management in Gaming: From Forums to AI-driven Engagement. Актуальные Исследования, 9(244), 69–73. https://doi.org/10.5281/zenodo.14968205

Red Grooming Latam. (2024). Los riesgos de niñas, niños y adolescentes en internet [[PDF file]. https://www.groominglatam.org/wp-content/uploads/2025/02/INFORME-2024-FLYER-INFORMATIVO.pdf

Rozgonjuk, D., Schivinski, B., Pontes, H. M., & Montag, C. (2023). Problematic Online Behaviors Among Gamers: the Links Between Problematic Gaming, Gambling, Shopping, Pornography Use, and Social Networking. International Journal of Mental Health and Addiction, 21(1), 240–257. https://doi.org/10.1007/s11469-021-00590-3

Schegloff, E. (2007). Sequence organization in interaction: A primer in conversation analysis I. Cambridge University Press.

Schweinberger, M. (2024). Analyzing Collocations and N-grams in R. Language Technology and Data Analysis Laboratory (LADAL). https://slcladal.github.io/coll.html

Sorlin, S. (2017). The pragmatics of manipulation: Exploiting im/politeness theories. Journal of Pragmatics, 121, 132–146. https://doi.org/10.1016/j.pragma.2017.10.002

Steward, K. (2023). Sensitivity vs specificity: Test accuracy explained. Technology Networks. https://www.technologynetworks.com/analysis/articles/sensitivity-vs-specificity-318222

Straka, M., & Straková, J. (2020). UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings. En R. Sprugnoli & M. Passarotti (Eds.), Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages (1st ed., pp. 124–129). European Language Resources Association (ELRA). https://aclanthology.org/2020.lt4hala-1.20/

Ting, K. M. (2010). Confusion Matrix. In G. I. Sammut Claude & Webb (Eds.), Encyclopedia of Machine Learning (1st ed., p. 209). Springer. https://doi.org/10.1007/978-0-387-30164-8_157

Vaamode, G., & González, F. (2008). Clasificación verbal, etiquetación semántica e información lexicográfica en el proyecto ADESSE. En M. Verdejo & A. Serrano (Eds.), Acceso y visibilidad de la información multilingüe en la red: El rol de la semántica (1st ed., pp. 225–236). UNED.

Van Dijk, T. (1997). Discourse as Structure and Process. Sage Publications.

Walsh, M. J. (2022). About ‘face’’: Reconsidering Goffman’s theory of face-work for digital culture.’ In M. Jacobsen & G. Smith (Eds.), The Routledge International Handbook of Goffman Studies (1st ed., pp. 207–218). Routledge. https://doi.org/10.4324/9781003160861-20

	Total	Since 2020
Citations	5674	5300
h Index	35	33
i10 index	115	105

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Issue

Section

Privacy statement

Author Biographies

Ana Paola Castañón Marroquín, Benemérita Universidad Autónoma de Puebla, Puebla, México

Brenda Ailed Rodríguez Colis, Benemérita Universidad Autónoma de Puebla, Puebla, México

Andrea Bazán Durán, Benemérita Universidad Autónoma de Puebla, Puebla, México

Luis Enrique Colmenares-Guillen, BUAP

How to Cite

Share

References

Most read articles by the same author(s)