DataBubble·

Model Detail

Turkish-Gemma-9b-T1

—

Provider: ytu-ce-cosmosCategory: llmPipeline: text-generationParameters: 9B

DB Score

0.3

Downloads

Likes

163

Day

+0.0%

Week

+0.0%

Month

+0.0%

Download History

Research Paper

arXiv: 2403.08295→

Model Info

Citations976 (122 influential)

Recent newsView all news →

Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues

arXiv:2604.13620v1 Announce Type: cross Abstract: Managing natural dialogue timing is a significant challenge for voice-based chatbots. Most current systems usually rely on simple silence detection, which often fails because human speech patterns involve irregular pauses. This causes bots to interru

arxiv13d ago

HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval

arXiv:2604.10665v1 Announce Type: new Abstract: HeceTokenizer is a syllable-based tokenizer for Turkish that exploits the deterministic six-pattern phonological structure of the language to construct a closed, out-of-vocabulary (OOV)-free vocabulary of approximately 8,000 unique syllable types. A BE

arxivneutral16d ago

TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization

arXiv:2604.07553v1 Announce Type: new Abstract: This study presents a framework for generating the gold-standard summary fully automatically and reproducibly based on multiple human summaries of Turkish educational videos. Within the scope of the study, a new dataset called TR-EduVSum was created, e

arxiv20d ago

HUKUKBERT: Domain-Specific Language Model for Turkish Law

arXiv:2604.04790v1 Announce Type: cross Abstract: Recent advances in natural language processing (NLP) have increasingly enabled LegalTech applications, yet existing studies specific to Turkish law have still been limited due to the scarcity of domain-specific data and models. Although extensive mod

arxiv26d ago

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish

arXiv:2508.14292v3 Announce Type: replace Abstract: Tokenization shapes how language models perceive morphology and meaning in NLP, yet widely used frequency-driven subword tokenizers (e.g., Byte Pair Encoding and WordPiece) can fragment morphologically rich and agglutinative languages in ways that

arxiv28d ago

Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models

arXiv:2501.04828v2 Announce Type: replace Abstract: This paper introduces foundational resources and models for natural language processing (NLP) of historical Turkish, a domain that has remained underexplored in computational linguistics. We present the first named entity recognition (NER) dataset,

Related Models

modernbert-tr-base-1k

ytu-ce-cosmos · 2K downloads

bert-base-uncased

google-bert · 58.9M downloads

paraphrase-multilingual-MiniLM-L12-v2

SBERT · 32.9M downloads