Avatar

Šandrih Todorović, Branislava

Download CV
branislava.sandrih [at] gmail.com

PhD of Computer Science (2020)
University of Belgrade, Serbia
Faculty of Mathematics
Department for Informatics and Computer Science

Data Science Team Lead (April 2023 - current)
Natural Language Processing Domain Owner
Data Scientist (August 2022 - April 2023)
NLB DigIT, Belgrade, Serbia

Dr. Branislava Šandrih Todorović is a full-time Data Science Team Lead employed in NLB DigIT company, closely collaborating with Center of Excellence within the NLB Bank in Ljubljana, as NLP expert. She received her PhD at the University of Belgrade, Faculty of Mathematics in 2020 (Impact of Text Classification on Natural Language Processing Applications). Her fields of research are machine learning and deep learning applied to development of tools, resources and models for the Serbian and Slovenian languages. She has published more than 30 papers in journals and proceedings of scientific conferences.

Until 2022, Branislava was engaged as a visiting researcher at the Research Group in Computational Linguistics in Wolverhampton University, Editorial Administrator for the Journal of Natural Language Engineering and Editor in Chief of the Journal of Digital Humanities Infotheca. She is a member of a Society for Language Resources and Technologies JeRTeh.

Branislava has developed several NLP tools and has established international connections within different projects (such are COST actions CA18231, CA16204, CA16105), and was participating in various research and development projects. During the PhD studies, she attended summer schools on NLP and machine learning (LxMLS 2018, ESSLLI 2018, DLinNLP 2019).

In 2021, Branislava received the Annual Award of the Mathematical Institute of the Serbian Academy of Sciences and Arts in the field of computing for PhD students

💖💖💖 On the 2nd of August 2021, Branislava gave birth to the loveliest boy in the world! His name is Mihailo, but you can call him Miki! 💖💖💖

Education

Formal Education

(2015 - 2020) Doctor of Computer Science
Faculty of Mathematics, Department of Computer Science and Informatics, Belgrade University, Serbia
(2014 - 2015) MSc of Mathematics
Faculty of Mathematics, Department of Computer Science and Informatics, Belgrade University, Serbia
(2010 - 2014) BSc of Mathematics
Faculty of Mathematics, Department of Computer Science and Informatics, Belgrade University, Serbia
(2006 - 2010) Electrotehnician of Computing
School of Electrotehnics "Nikola Tesla", Pančevo, Serbia

Summer Schools and Seminars

(2019) DLinNLP 2019
Summer School on Deep Learning in Natural Language Processing
(2018) ESSLLI 2018
30th European Summer School in Logic, Language and Information
(2018) LxMLS 2018
8th Lisbon Machine Learning School
(2007 - 2010) Seminar of Programming
Regional Talents' Center "Mihajlo Pupin", Pančevo, Serbia
(2007) Seminar of Programming
Research Station "Petnica", Valjevo, Serbia

Interests

Publications

Articles Published in Journals

Erdem, Erkut, Menekse Kuyu, Semih Yagcioglu, Anette Frank, Letitia Parcalabescu, Barbara Plank, Andrii Babii, Oleksii Turuta, Aykut Erdem, Iacer Calixto, Elena Lloret, Elena-Simona Apostol, Ciprian-Octavian Truica, Branislava Šandrih, Albert Gatt, Sanda Martinčić-Ipšić, Gabor Berend, and Gražina Korvel. Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning. Journal of Artificial Intelligence Research, volume 73: 1131-1207, 2022. URL
Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, and Mihailo Škorić. Annotation of the Serbian ELTeC Collection. Infotheca - Journal for Digital Humanities, 21(2): 43-59, 2021. URL
Tanja Ivanović, Ranka Stanković, Branislava Šandrih Todorović, and Cvetana Krstev. Corpus-Based Bilingual Terminology Extraction in Power Engineering Domain. Terminology: International Journal of Theoretical and Applied Issues in Specialized Communication, 2022. URL
Данило Алексић и Бранислава Шандрих. Аутоматска ексцерпција парова речи за учење изговора у настави српског као страног језика. Српски језик: студије српске и словенске, 26(1): 567-584, 2021. URL
Branislava Šandrih, Cvetana Krstev, and Ranka Stanković. Two Approaches to Compilation of Bilingual Multi-Word Terminology Lists from Lexical Resources. Natural Language Engineering 26, no. 4 (2020): 455–79. URL
Branislava Šandrih and Ranka Stanković. Extraction of Bilingual Terminology using Graphs, Dictionaries and GIZA++. Infotheca - Journal for Digital Humanities, 19(2), 2019. URL
Jelena Andonovski, Branislava Šandrih, and Olivera Kitanović. Bilingual Lexical Extraction Based on Word Alignment for Improving Corpus Search. The Electronic Library, 37 (2), 2019. URL
Branislava Šandrih. SMS Sentiment Classification based on Lexical Features, Emoticons and Informal Abbreviations. Serdica Journal of Computing, 13(1-2), 2019. URL
Branislava Šandrih. Informatics for Library and Information Science students with special focus on Python. Infotheca - Journal for Digital Humanities, 18 (1):63–77, 2018. URL
Branislava Šandrih, Dušan Tošić, and Vladimir Filipović. Towards Efficient and Unified XML/JSON Conversion – a New Conversion Method. Transactions on Internet Research (TIR), 13 (1):58–64, January 2017. URL
Branislava Šandrih, Vladimir Filipović, Saša Malkov, and Aleksandar Kartelj. Distributed Computing Among Independent Web Browsers Applied to Text and Image Processing. Review of the National Center for Digitization, (31):30–39, 2017. URL

Proceedings & Books

Venelin Kovatchev, Irina P. Temnikova, Branislava Šandrih, Ivelina Nikolova: Proceedings of the Student Research Workshop Associated with RANLP 2019. RANLP Student Research Workshop 2019. URL

Thesis

Branislava Šandrih. IMPACT OF TEXT CLASSIFICATION ON NATURAL LANGUAGE PROCESSING APPLICATIONS. PhD thesis. University of Belgrade, Faculty of Mathematics, 2020. URL

Articles Published in Conference Proceedings

Цветана Крстев, Ранка Станковић, Бранислава Шандрих Тодоровић, Милица Иконић Нешић. НОВЕ ТЕХНОЛОГИЈЕ ЗА ОЖИВЉАВАЊЕ СТАРИХ ТЕКСТОВА. У ДИГИТАЛНА ХУМАНИСТИКА И СЛОВЕНСКО КУЛТУРНО НАСЛЕЂЕ II, стр. 79–96. Савез славистичких друштава Србије, Септембар 2023. URL
Branislava Šandrih Todorović, Katarina Josipović, and Jurij Kodre. Three Approaches to Client Email Topic Classification. In RANLP 2023: Recent Advances in Natural Language Processing, pages 1011–1018. INCOMA Ltd., September 2023. URL
Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Dusko Vitas, Mihailo Skoric, and Milica Ikonić Nešić. Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3337–3345, Marseille, France. European Language Resources Association. 2022. URL
Rina Zviel-Girshin, Ana R. Luı́s, Tanara Zingano Kuhn, Špela Arhar Holdt, Branislava Šandrih Todorović, Carole Tiberius, Kristina Koppel, Danka Jokić, and Iztok Kosem. Developing Pedagogically Appropriate Language Corpora through Crowdsourcing and Gamification. In EUROCALL 2021, 2021. URL
Danka Jokić, Ranka Stanković, Cvetana Krstev, and Branislava Šandrih. A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian. In Dagmar Gromann, Gilles Sérasset, Thierry Declerck, John P. McCrae, Jorge Gracia, Julia Bosque-Gil, Fernando Bobillo, and Barbara Heinisch, editors, 3rd Conference on Language, Data and Knowledge (LDK 2021), volume 93 of Open Access Series in Informatics (OASIcs), pages 13:1–13:17, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. URL
Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, and Milica Ikonić Nešić. Serbian NER&Beyond: The Archaic and the Modern Intertwinned. In RANLP 2021: Deep Learning for Natural Language Processing Methods and Applications, pages 1252–1260. INCOMA Ltd., September 2021. URL
Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, and Mihailo Skoric. Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 3947–3955, Marseille, France, May 2020. European Language Resources Association. URL
Cvetana Krstev, Jelena Jaćimović, Branislava Šandrih, and Ranka Stanković. Analysis of the first Serbian literature corpus of the late 19th and early 20th century with the txm platform. In DH_BUDAPEST_2019, pages 36–37. Centre for Digital Humanities - Eötvös Loránd University, 2019. URL
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, and Aleksandra Marković. SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian. In Zingano Kuhn T. Correia M. Ferreria J. P. Jansen M. Pereira I. Kallas J. Jakubı́ček M. Krek S. Kosem, I. and C. Tiberius, editors, Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference, pages 248–269. 1-3 October 2019, Sintra, Portugal. Brno: Lexical Computing CZ, s.r.o., 2019. URL
Branislava Šandrih, Cvetana Krstev, and Ranka Stanković. Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names. In RANLP 2019: Recent Advances in Natural Language Processing, pages 1060–1068, 2019. URL
Бранислава Шандрих, Ранка Станковић, and Мирjана Гочанин. Чиjи jе пример? Анализа лексичких обележjа на примерима Речника САНУ. In Научни састанак слависта у Вукове дане – Српски jезик и његови ресурси: теориjа, опис и примене. Међународни славистички центар, Београд, Vol. 48(3): 299–316, 2019. URL
Branislava Šandrih. Fingerprints in SMS messages: Automatic Recognition of a Short Message Sender Using Gradient Boosting. In 3rd International Conference Computational Linguistics in Bulgaria (CLIB 2018), pages 203–210. Department of Computational Linguistics at the Institute for Bulgarian Language with the Bulgarian Academy of Sciences, May 2018. URL
Бранислава Шандрих и Душко Витас. Квантитативни преглед jезика кратких порука. In Научни састанак слависта у Вукове дане – Српски jезик и његови ресурси: теориjа, опис и примене. Међународни славистички центар, Београд, 47(3): 155–165, 2018. URL
Cvetana Krstev, Branislava Šandrih, Ranka Stanković, and Miljana Mladenović. Using English Baits to Catch Serbian Multi-Word Terminology. In Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, France, May 2018. European Language Resources Association (ELRA). URL
Cvetana Krstev, Duško Vitas, Miloš Utvić, and Branislava Šandrih. The New Clothes for an Old Cookbook. In 8th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2017), Poznań, Poland, November 2017. URL
Ranka Stanković, Branislava Šandrih, Olivera Kitanović, Ivan Obradović, and Miloš Manić. An E-Learning Approach to Social Sciences. In The 8th International Conference on eLearning (eLearning-2017), pages 26–29, Belgrade, Serbia, Septembar 2017. Belgrade Metroplitan University. URL
Branislava Šandrih. Mogući doprinos LaTeX-a u obrazovnom procesu. In Slobodan softver u obrazovanju, pages 63–66, Sremski Karlovci, Srbija, Januar 2016. Udruženje profesora informatike Srbije. URL

Abstracts Published in Books of Abstracts

Tanara Zingano Kuhn, Branislava Šandrih Todorović, Špela Arhar Holdt, Rina Zviel-Girshin, Kristina Koppel, Ana R. Luı́s, and Iztok Kosem. Crowdsourcing Pedagogical Corpora for Lexicographical Purposes. In EURALEX XIX. Book of abstracts. URL
Tanara Zingano Kuhn, Rina Zviel-Girshin, Špela Arhar Holdt, Branislava Šandrih Todorović, Carole Tiberius, Ana Luis, Kristina Koppel, Danka Jokić, and Iztok Kosem. Gamifying the Path to Corpus-Based Pedagogical Dictionaries. In I. Kosem and M. Cukr, editors, Electronic lexicography in the 21st century (eLex 2021): Post-editing lexicography. Book of abstracts, pages 29–31. Virtual, 5–7 July 2021. Brno: Lexical Computing CZ s.r.o., 2021. URL
Tanara Zingano Kuhn, Rina Zviel-Girshin, Špela Arhar Holdt, Branislava Šandrih Todorović, Carole Tiberius, Ana Luis, Kristina Koppel, Danka Jokić, and Iztok Kosem. Gamifying the Path to Corpus-Based Pedagogical Dictionaries. In I. Kosem and M. Cukr, editors, Electronic lexicography in the 21st century (eLex 2021): Post-editing lexicography. Book of abstracts, pages 29–31. Virtual, 5–7 July 2021. Brno: Lexical Computing CZ s.r.o., 2021. URL
Peter Dekker, Tanara Zingano Kuhn, Branislava Šandrih, and Rina Zviel-Girshin. Corpus Cleaning via Crowdsourcing for Developing a Learner’s Dictionary. In I. Kosem and T. Zingano Kuhn, editors, Electronic lexicography in the 21 st century (eLex 2019): Smart Lexicography. Book of abstracts., pages 84–85, 2019. URL
Tanara Zingano Kuhn, Peter Dekker, Branislava Šandrih, Rina Zviel-Girshin, Špela Arhar Holdt,and Tanneke Schoonheim. Crowdsourcing Corpus Cleaning for Language Learning Resource Development. In EuroCALL 2019: European Association of Computer Assisted Language Learning, page 159, 2019. URL
Branislava Šandrih. SMS Sentiment Classification based on Emoticons, Informal Abbreviations and other Text Features. In International Quantitative Linguistics Conference (QUALICO 2018), page 73. Institute of Library and Information Science / Faculty of Mathematics and Computer Science (University of Wroc law), July 2018. URL
Branislava Šandrih, Vladimir Filipović, Saša Malkov, and Aleksandar Kartelj. Globalna izračunavanja u mreži internet pregledača – primena u obradi slika. In Digitalizacija kulturne baštine, starih zapisa iz prirodnih i društvenih naukai digitalna humanistika, Beograd, Srbija, Septembar 2017. Faculty of Mathematics, University of Belgrade. URL

Lab

BiLTE: Bilingual Domain Terminology Extraction
NER&Beyond: Named Entity Recognition Toolkit
spaCy NER Models for Serbian
Good Dictionary Examples
Stylometric Feature Extractor
KaMP (Danilo Aleksić)

Research Projects

CA18231 - Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation
CA16105 - European Network for Combining Language Learning with Crowdsourcing Techniques
CA16204 - Distant Reading for European Literary History
Serbian Language and Its Resources: Theory, Description and Applications (ended in 2019)

Supervision

Master students

PhD students

Skills

Technical Skills

Programming/Script languages Technologies NLP Tools and Frameworks
  • Python
  • C, NodeJS
  • Java
  • PHP5, HTML5, CSS3, JavaScript
  • MySQL, MongoDB
  • Matlab
  • KNIME, Weka
  • IBM DB2 Intelligent Miner

Language Profficiency

Work

Data Science

Faculty of Philology
(2016 – 2022)
Multidisciplinary Studies at University of Belgrade, MSc Programme Computing in Social Sciences
(2017 – 2022)
Multidisciplinary Studies at University of Belgrade, PhD Programme Intelligent Systems
(2020 – 2022)
Faculty of Mathematics
(2015 – 2016)
  • Informatics for Librarians (BSc)
  • Practicum of Informatics (BSc)
  • Digital Text (BSc)
  • Structure of information (BSc)
  • Language Technologies (BSc)
  • Multimedia Documents (BSc)
  • Information Retrieval (BSc)
  • Advanced Methods in Information Retrieval (MSc)
  • Advanced Language Technologies (MSc)
  • Structuring and Management of Web Content (MSc)
  • Programming for Linguists (MSc)
  • Introduction to Cognitive Linguistics (MSc)
  • Natural Language Processing (PhD)
  • Machine Learning (PhD)
  • Programming (BSc)
  • Introduction to Computer Architecture (BSc)
  • Object Orientated Programming (BSc)

Software Development

Other

Internships and Awards

Invited Seminars

Reviewing / Program Committee Membership

Scientific Journals

International Conferences

Memberships