This web page contains links to training and testing sets for various research results produced by the BioText project.
Please acknowledge your access to this data by citing this paper if you use the data in research or for other purposes:
A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text, Ariel Schwartz and Marti Hearst, in the proceedings of the Pacific Symposium on Biocomputing (PSB 2003) pdf
To develop this collection, 1000 MEDLINE abstracts were randomly
selected from the results of a query on the term "yeast". These were
then hand tagged, producing a list of 954 correct
The dataset was first annotated by a researcher in computational and
biosciences. The data was further verified by comparing any questionable pairs
against other occurrences of the same abbreviation in other abstracts, using the
web site provided by Chang,
Schuetze, and Altman 2002. A pair extracted by the Schwartz and Hearst
algorithm is considered correct only if it exactly matches a pair labeled in the
dataset.
Please acknowledge your access to this data by citing this paper if you use
the data in research or for other purposes:
Multi-way Relation Classification: Application to Protein-Protein
Interaction, Barbara Rosario and Marti Hearst, in HLT-NAACL'05,
Vancouver, 2005. pdf
The dataset was annotated by a researcher in
computational and biosciences. In the paper above we describe how we extracted
the data. The format is the following:
interaction_type====PaperPubMedID_Prot1_ID_Prot2_ID==>sentence with proteins
labeled|| .....
Please acknowledge your access to this data by citing this paper if you use
the data in research or for other purposes:
Classifying Semantic Relations in Bioscience Text, Barbara
Rosario and Marti A. Hearst, in the proceedings of the 42nd Annual
Meeting of the Association for Computational Linguistics (ACL 2004),
Barcelona, July 2004. pdf
Information about, and links to, the files
Please acknowledge your access to this data by citing this paper if
you use the data in research or for other purposes:
Classifying the Semantic Relations in Noun Compounds via a
Domain-Specific Lexical Hierarchy. Barbara Rosario and Marti
Hearst, Proceedings of 2001 Conference on Empirical Methods in
Natural Language Processing, Pittsburgh, PA (EMNLP 2001).
pdf
In the following files are all the labeled NC used in the experiments described
in the paper Classifying the Semantic Relations in Noun Compounds
via a Domain-Specific Lexical Hierarchy. Protein-Protein Interaction Data
Relations between DISEASE/TREATMENT Entities
Noun Compound Semantics