Sandra Antunes

Lisboa, Lisboa, Portugal Informações de contato
885 seguidores + de 500 conexões

Cadastre-se para ver o perfil

Atividades

Cadastre-se agora para visualizar todas as atividades

Experiência e formação acadêmica

  • Mozilla.ai

Ver experiência completa de Sandra

Veja o cargo, tempo na empresa e muito mais dessa pessoa.

ou

Ao clicar em Continuar para se cadastrar ou entrar, você aceita o Contrato do Usuário, a Política de Privacidade e a Política de Cookies do LinkedIn.

Publicações

  • A Lexical Database for the Analysis of Portuguese MWEs

    EUROPHRAS 2017 - Computational and Corpus-based Phraseology: Recent Advances and Interdisciplinary Approaches

  • The annotation coreference task at IberEval’17: the experience of CLUL/UE

    IBEREVAL‐ 2017: Evaluation of Human Language Technologies for Iberian languages, At Murcia, Spain

    In this paper the process of coreference annotation in Portuguese texts in the context of a task of IberEval 2017 is described and the main observed problems are discussed. The work was done by a team of researchers from the Centre for Linguistics of the University of Lisbon (CLUL) and from the Computer Science Department of the University of Évora (UE). Due to time constraints and the complexity of the task, only researchers from CLUL were able to fnish successfully the annotation process. The…

    In this paper the process of coreference annotation in Portuguese texts in the context of a task of IberEval 2017 is described and the main observed problems are discussed. The work was done by a team of researchers from the Centre for Linguistics of the University of Lisbon (CLUL) and from the Computer Science Department of the University of Évora (UE). Due to time constraints and the complexity of the task, only researchers from CLUL were able to fnish successfully the annotation process. The main problems are presented and discussed and some possible solutions are proposed. Nevertheless, the obtained results are similar with the overall results of the task.

  • Towards error annotation in a learner corpus of Portuguese

    NLP4LA Conference, Sweden

    In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to…

    In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in xml format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.

  • Collocations in Portuguese: A corpus-based approach to lexical patterns

    Collocations Cross-Linguistically. Corpora, Dictionaries and Language Teaching, Publisher: Société Néophilologique, Editors: Begoña Vilas Sanromán, pp.141-166

  • An evaluation of the role of statistical measures and frequency for MWE identification

    LREC 2014, At Reykjavik, Iceland

    We report on an experiment to evaluate the role of statistical association measures and frequency for the identification of MWE. We base our evaluation on a lexicon of 14.000 MWE comprising different types of word combinations: collocations, nominal compounds, light verbs + predicate, idioms, etc. These MWE were manually validated from a list of n-grams extracted from a 50 million word corpus of Portuguese (a subcorpus of the Reference Corpus of Contemporary Portuguese), using several criteria:…

    We report on an experiment to evaluate the role of statistical association measures and frequency for the identification of MWE. We base our evaluation on a lexicon of 14.000 MWE comprising different types of word combinations: collocations, nominal compounds, light verbs + predicate, idioms, etc. These MWE were manually validated from a list of n-grams extracted from a 50 million word corpus of Portuguese (a subcorpus of the Reference Corpus of Contemporary Portuguese), using several criteria: syntactic fixedness, idiomaticity, frequency and Mutual Information measure, although no threshold was established, either in terms of group frequency or MI. We report on MWE that were selected on the basis of their syntactic and semantics properties while the MI or both the MI and the frequency show low values, which would constitute difficult cases to establish a cutting point. We analyze the MI values of the MWE selected in our gold dataset and, for some specific cases, compare these values with two other statistical measures.

  • MWE in Portuguese: Proposal for a Typology for Annotation in Running Text

    9th Workshop on Multiword Expressions. North American Chapter of the Association for Computational Linguistics, At Atlanta, Georgia, USA

    Based on a lexicon of Portuguese MWE, this presentation focuses on an ongoing work that aims at the creation of a typology that de-scribes these expressions taking into account their semantic, syntactic and pragmatic prop-erties. We also plan to annotate each MWE-entry in the mentioned lexicon according to the information obtained from that typology. Our objective is to create a valuable resource, which will allow for the automatic identifica-tion MWE in running text and for a deeper…

    Based on a lexicon of Portuguese MWE, this presentation focuses on an ongoing work that aims at the creation of a typology that de-scribes these expressions taking into account their semantic, syntactic and pragmatic prop-erties. We also plan to annotate each MWE-entry in the mentioned lexicon according to the information obtained from that typology. Our objective is to create a valuable resource, which will allow for the automatic identifica-tion MWE in running text and for a deeper understanding of these expressions in their context.

  • CQPWeb: uma nova plataforma de pesquisa para o CRPC

    XXVII Encontro Nacional da Associação Portuguesa de Linguística. Textos Seleccionados

    We present a newly available online resource for Portuguese, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. We report on work carried out on the corpus previous to its publication online, namely how the corpus was built, our choice of metadata and the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. We also describe the web platform and resume the…

    We present a newly available online resource for Portuguese, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. We report on work carried out on the corpus previous to its publication online, namely how the corpus was built, our choice of metadata and the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. We also describe the web platform and resume the extensive search options available for linguistic or NLP studies.

  • A Lexical Database of Portuguese Multiword Expressions

    Proceedings of the 7th international conference on Computational Processing of the Portuguese Language Sandra Antunes at University of Lisbon

    This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new…

    This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.

  • Typologies of MultiWord Expressions Revisited

    Spoken Language Corpus and Linguistic Informatics, pp.227-244

  • The Portuguese corpus

    C-ORAL-ROM, pp.163-207

Recomendações recebidas

Mais atividade de Sandra

Veja o perfil completo de Sandra

  • Saiba quem vocês conhecem em comum
  • Apresente-se
  • Entre em contato direto com Sandra
Cadastre-se para ver o perfil completo

Outros perfis semelhantes

Outras pessoas chamadas Sandra Antunes em Portugal

Adicione novas competências com estes cursos