“Sandra is such a nice person to have around. We worked together at the same Product team at DefinedCrowd, and she was a great professional. From day one, Sandra helped me a lot to get started and gave me support in understanding the Products and some internal processes. She had always contributed to a good team spirit. With an incredible working ethic, she has an impressive knowledge in the Linguist area, extremely organized, reliable, and detail-oriented. Thanks a lot, Sandra! It was good to work with you. ”
Lisboa, Lisboa, Portugal
Informações de contato
885 seguidores
+ de 500 conexões
Atividades
-
Julie V. Belião, Senior Director of Product Innovation of Mozilla.ai at TAUS 2024 Conference in Rome. Thank you Rares Vasilescu for posting!
Julie V. Belião, Senior Director of Product Innovation of Mozilla.ai at TAUS 2024 Conference in Rome. Thank you Rares Vasilescu for posting!
Sandra Antunes gostou
-
Next week, Julie V. Belião, Senior Director of Product Innovation at Mozilla.ai, will be speaking at the TAUS Massively Multilingual AI Conference…
Next week, Julie V. Belião, Senior Director of Product Innovation at Mozilla.ai, will be speaking at the TAUS Massively Multilingual AI Conference…
Sandra Antunes gostou
-
Perhaps we shouldn't be asking, "What if AI technology falls into the wrong hands?" but instead ask ourselves, "What if AI technology falls into the…
Perhaps we shouldn't be asking, "What if AI technology falls into the wrong hands?" but instead ask ourselves, "What if AI technology falls into the…
Sandra Antunes gostou
Experiência e formação acadêmica
Publicações
-
A Lexical Database for the Analysis of Portuguese MWEs
EUROPHRAS 2017 - Computational and Corpus-based Phraseology: Recent Advances and Interdisciplinary Approaches
-
The annotation coreference task at IberEval’17: the experience of CLUL/UE
IBEREVAL‐ 2017: Evaluation of Human Language Technologies for Iberian languages, At Murcia, Spain
In this paper the process of coreference annotation in Portuguese texts in the context of a task of IberEval 2017 is described and the main observed problems are discussed. The work was done by a team of researchers from the Centre for Linguistics of the University of Lisbon (CLUL) and from the Computer Science Department of the University of Évora (UE). Due to time constraints and the complexity of the task, only researchers from CLUL were able to fnish successfully the annotation process. The…
In this paper the process of coreference annotation in Portuguese texts in the context of a task of IberEval 2017 is described and the main observed problems are discussed. The work was done by a team of researchers from the Centre for Linguistics of the University of Lisbon (CLUL) and from the Computer Science Department of the University of Évora (UE). Due to time constraints and the complexity of the task, only researchers from CLUL were able to fnish successfully the annotation process. The main problems are presented and discussed and some possible solutions are proposed. Nevertheless, the obtained results are similar with the overall results of the task.
-
Towards error annotation in a learner corpus of Portuguese
NLP4LA Conference, Sweden
In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to…
In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in xml format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.
-
Collocations in Portuguese: A corpus-based approach to lexical patterns
Collocations Cross-Linguistically. Corpora, Dictionaries and Language Teaching, Publisher: Société Néophilologique, Editors: Begoña Vilas Sanromán, pp.141-166
-
An evaluation of the role of statistical measures and frequency for MWE identification
LREC 2014, At Reykjavik, Iceland
We report on an experiment to evaluate the role of statistical association measures and frequency for the identification of MWE. We base our evaluation on a lexicon of 14.000 MWE comprising different types of word combinations: collocations, nominal compounds, light verbs + predicate, idioms, etc. These MWE were manually validated from a list of n-grams extracted from a 50 million word corpus of Portuguese (a subcorpus of the Reference Corpus of Contemporary Portuguese), using several criteria:…
We report on an experiment to evaluate the role of statistical association measures and frequency for the identification of MWE. We base our evaluation on a lexicon of 14.000 MWE comprising different types of word combinations: collocations, nominal compounds, light verbs + predicate, idioms, etc. These MWE were manually validated from a list of n-grams extracted from a 50 million word corpus of Portuguese (a subcorpus of the Reference Corpus of Contemporary Portuguese), using several criteria: syntactic fixedness, idiomaticity, frequency and Mutual Information measure, although no threshold was established, either in terms of group frequency or MI. We report on MWE that were selected on the basis of their syntactic and semantics properties while the MI or both the MI and the frequency show low values, which would constitute difficult cases to establish a cutting point. We analyze the MI values of the MWE selected in our gold dataset and, for some specific cases, compare these values with two other statistical measures.
-
MWE in Portuguese: Proposal for a Typology for Annotation in Running Text
9th Workshop on Multiword Expressions. North American Chapter of the Association for Computational Linguistics, At Atlanta, Georgia, USA
Based on a lexicon of Portuguese MWE, this presentation focuses on an ongoing work that aims at the creation of a typology that de-scribes these expressions taking into account their semantic, syntactic and pragmatic prop-erties. We also plan to annotate each MWE-entry in the mentioned lexicon according to the information obtained from that typology. Our objective is to create a valuable resource, which will allow for the automatic identifica-tion MWE in running text and for a deeper…
Based on a lexicon of Portuguese MWE, this presentation focuses on an ongoing work that aims at the creation of a typology that de-scribes these expressions taking into account their semantic, syntactic and pragmatic prop-erties. We also plan to annotate each MWE-entry in the mentioned lexicon according to the information obtained from that typology. Our objective is to create a valuable resource, which will allow for the automatic identifica-tion MWE in running text and for a deeper understanding of these expressions in their context.
-
CQPWeb: uma nova plataforma de pesquisa para o CRPC
XXVII Encontro Nacional da Associação Portuguesa de Linguística. Textos Seleccionados
We present a newly available online resource for Portuguese, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. We report on work carried out on the corpus previous to its publication online, namely how the corpus was built, our choice of metadata and the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. We also describe the web platform and resume the…
We present a newly available online resource for Portuguese, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. We report on work carried out on the corpus previous to its publication online, namely how the corpus was built, our choice of metadata and the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. We also describe the web platform and resume the extensive search options available for linguistic or NLP studies.
-
A Lexical Database of Portuguese Multiword Expressions
Proceedings of the 7th international conference on Computational Processing of the Portuguese Language Sandra Antunes at University of Lisbon
This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new…
This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.
-
Typologies of MultiWord Expressions Revisited
Spoken Language Corpus and Linguistic Informatics, pp.227-244
-
The Portuguese corpus
C-ORAL-ROM, pp.163-207
Recomendações recebidas
2 pessoas recomendaram Sandra
Cadastre-se agora para visualizarMais atividade de Sandra
-
Over the past few months at Mozilla.ai, we engaged with a number of organizations to learn how they are using language models in practice. We spoke…
Over the past few months at Mozilla.ai, we engaged with a number of organizations to learn how they are using language models in practice. We spoke…
Sandra Antunes gostou
-
Picking a Summarization Model: Abstractive or Extractive? Finding a good model for summarization is a daunting task, as the typical intuition…
Picking a Summarization Model: Abstractive or Extractive? Finding a good model for summarization is a daunting task, as the typical intuition…
Sandra Antunes gostou
-
Mozilla's Linda Griffin speaking with POLITICO's Steven Overly on a variety of issues surrounding #AI, including a brief teaser of what we're working…
Mozilla's Linda Griffin speaking with POLITICO's Steven Overly on a variety of issues surrounding #AI, including a brief teaser of what we're working…
Sandra Antunes gostou
Outros perfis semelhantes
Outras pessoas chamadas Sandra Antunes em Portugal
-
Sandra Antunes
Managing Director
-
Sandra Antunes
Office Manager at Cision Portugal
-
Sandra Antunes
🏠 CEO
-
Sandra Antunes
Mais 164 pessoas chamadas Sandra Antunes fazem parte do LinkedIn em Portugal
Veja mais pessoas chamadas Sandra Antunes