Data Schemas
Details of data frames returned by orphanet-parser. All data is parsed from XML files provided by Orphanet, and column names are preserved (as snake case) to the extent possible. We currently do not sanitize the data, so some inconsistencies present in the Orphanet XML files persist (e.g. yes/no vs y/n for booleans).
For more detailed information on the datasets, see the Orphadata free access product description.
Prevalence
One row represents one prevalence estimate. A disorder may have multiple prevalence estimates.
Column | Description | Values |
---|---|---|
orphacode | Unique identifier of disorder | int |
expert_link | Link to Orphanet page for disase | str |
disorder_name | Most generally accepted name of disorder | str |
disorder_group | Hierarchical level of the clinical entity. | "Group of disorders" "Disorder" "Subtype of disorder" |
prevalence_source | Source of information for prevalence estimate | str |
prevalence_type | Type of prevalence estimate | "Point prevalence" "birth prevalence" "lifelong prevalence" "incidence" "cases/families" |
prevalence_qualification | "Value and Class" "Only class" "Case" "Family" |
|
prevalence_class | Estimated prevalence | ">1 / 1,000" "1-5 / 10,000" "6-9 / 10,000" "1-9 / 100,000" "1-9 / 1,000,000" "<1 /1,000,000" "Not yet documented" "Unknown" |
prevalence_geographic | Geographic area of prevalence type | str |
prevalence_validation_status | Validation status | "Validated" "Not yet validated" |
Natural history
One row represents one disorder, and contains information on the age of onset and type of inheritance.
Column | Description | Values |
---|---|---|
orphacode | Unique identifier of disorder | int |
expert_link | Link to Orphanet page for disase | str |
disorder_name | Most generally accepted name of disorder | str |
disorder_group | Hierarchical level of the clinical entity | "Group of disorders" "Disorder" "Subtype of disorder" |
average_age_of_onset | Groups corresponding to estimated average age of onset.If more than one age of onset is provided, they are alphabetically sorted and semicolon-separated. | "Antenatal" "Neonatal" "Infancy" "Childhood" "Adolescence" "Adult" "Elderly" "All ages" "No data available" |
type_of_inheritance | Type of inheritance.If more than one age of onset is provided, they are alphabetically sorted and semicolon-separated. | "Autosomal dominant" "Autosomal recessive" "Multigenic/multifactorial" "Mitochondrial inheritance" "X-linked dominant" "X-linked recessive" "Not applicable" "No data available" "Unknown" |
Gene associations
One row represents one association between a gene and disorder.
Column | Description | Values |
---|---|---|
orphacode | Unique identifier of disorder | int |
expert_link | Link to Orphanet page for disase | str |
disorder_name | Most generally accepted name of disorder | str |
disorder_group | Hierarchical level of the clinical entity | "Group of disorders" "Disorder" "Subtype of disorder" |
association_type | Gene-disease relationship | "Biomarker tested in" "Candidate gene tested in" "Disease-causing germline mutation(s) (gain of function) in" "Disease-causing germline mutation(s) (loss of function) in" "Disease-causing germline mutation(s) in" "Disease-causing somatic mutation(s) in" "Major susceptibility factor in" "Modifying germline mutation in" "Part of a fusion gene in" "Role in the phenotype of" |
association_status | Gene-disease association status | "Validated" "Not validated" |
gene_symbol | HGNC-approved gene symbol | str |
gene_name | Full gene name | str |
gene_type | Gene type | "gene with protein product" "Non-coding RNA" "Disorder-associated locus" |
external_references | List of references in HGNC, OMIM, GenAtlas and UniProtKB, Ensembl, Reactome and IU-PHAR associated with a given gene | str |
source_of_validation | Listed reference for a given source associated with a gene | str |
Associated phenotypes
One row represents one disorder/phenotype pair. A disorder may have multiple associated phenotypes.
Column | Description | Values |
---|---|---|
orphacode | Unique identifier of disorder | int |
expert_link | Link to Orphanet page for disase | str |
disorder_name | Most generally accepted name of disorder | str |
disorder_group | Hierarchical level of the clinical entity. | "Group of disorders" "Disorder" "Subtype of disorder" |
hpo_id | Unique identifying number assigned by HPO to a given phenotype | str |
hpo_term | Preferred name of HPO phenotype | str |
hpo_frequency | Estimated frequency of phenotype within disorder | "Obligate (100%)" "Very frequent (99-80%)" "Frequent (79-30%)" "Occasional (29-5%)" "Very rare (<4-1%)" "Excluded (0%)" |
diagnostic_criteria | Indicator of phenotype being a pathognomonic sign or a diagnostic criterion in disorder | "Diagnostic criterion" "Pathognomonic sign" |
source | Reference | str |
Functional consequences
One row represents one disorder/functional consequence pair. A disorder may have multiple functional consequences.
Column | Description | Values |
---|---|---|
orphacode | Unique identifier of disorder | int |
expert_link | Link to Orphanet page for disase | str |
disorder_name | Most generally accepted name of disorder | str |
disorder_group | Hierarchical level of the clinical entity. | "Group of disorders" "Disorder" "Subtype of disorder" |
disability | Name of disability | str |
disability_category | Category of disability | “Activity limitation/participation restriction” “No functional disability” “Not applicable” |
reason_for_not_applicable | If category is not applicable, the identified reason | “Hypervariable functioning” “Early death-causing disease” “Not applicable for another reason” |
frequence_disability | Frequency of the functional consequence in the given population | "very frequent" "frequent" "occasional" |
temporality_disability | Temporality of the functional consequence in the given population | “permanent limitation/restriction” “transient limitation/restriction” “delayed acquisition” |
severity_disability | Severity of the functional consequence in the given population | “low” “moderate” “severe” “complete” “Unspecified” |
loss_of_ability | Defined as a progressive and definitive loss of a skill or ability over the course of the disease | "yes" "no" |
type | Disability (functional consequence) or environmental factor | "Disability" "Environmental factor" |
defined | Indicator for severity, temporality, and frequency being defined | "y" "n" |
source_of_validation | Source of validation of the given clinical entity’s annotation | str |
specific_management | If specific management protocol is known for the given disease, this field will indicate “y” for yes and all the annotations will have been conducted considering this specific management protocol. | "y" "n" |
annotation_date | Date of annotation | str |
status_disability | Status of the validation of the given clinical entity’s annotation | "Validated" "Not validated" |