Skip to content

Data Schemas

Details of data frames returned by orphanet-parser. All data is parsed from XML files provided by Orphanet, and column names are preserved (as snake case) to the extent possible. We currently do not sanitize the data, so some inconsistencies present in the Orphanet XML files persist (e.g. yes/no vs y/n for booleans).

For more detailed information on the datasets, see the Orphadata free access product description.

Prevalence

One row represents one prevalence estimate. A disorder may have multiple prevalence estimates.

Column Description
Values
orphacode Unique identifier of disorder int
expert_link Link to Orphanet page for disase str
disorder_name Most generally accepted name of disorder str
disorder_group Hierarchical level of the clinical entity. "Group of disorders"
"Disorder"
"Subtype of disorder"
prevalence_source Source of information for prevalence estimate str
prevalence_type Type of prevalence estimate "Point prevalence"
"birth prevalence"
"lifelong prevalence"
"incidence"
"cases/families"
prevalence_qualification "Value and Class"
"Only class"
"Case"
"Family"
prevalence_class Estimated prevalence ">1 / 1,000"
"1-5 / 10,000"
"6-9 / 10,000"
"1-9 / 100,000"
"1-9 / 1,000,000"
"<1 /1,000,000"
"Not yet documented"
"Unknown"
prevalence_geographic Geographic area of prevalence type str
prevalence_validation_status Validation status "Validated"
"Not yet validated"

Natural history

One row represents one disorder, and contains information on the age of onset and type of inheritance.

Column Description
Values
orphacode Unique identifier of disorder int
expert_link Link to Orphanet page for disase str
disorder_name Most generally accepted name of disorder str
disorder_group Hierarchical level of the clinical entity "Group of disorders"
"Disorder"
"Subtype of disorder"
average_age_of_onset Groups corresponding to estimated average age of onset.

If more than one age of onset is provided, they are alphabetically sorted and semicolon-separated.
"Antenatal"
"Neonatal"
"Infancy"
"Childhood"
"Adolescence"
"Adult"
"Elderly"
"All ages"
"No data available"
type_of_inheritance Type of inheritance.

If more than one age of onset is provided, they are alphabetically sorted and semicolon-separated.
"Autosomal dominant"
"Autosomal recessive"
"Multigenic/multifactorial"
"Mitochondrial inheritance"
"X-linked dominant"
"X-linked recessive"
"Not applicable"
"No data available"
"Unknown"

Gene associations

One row represents one association between a gene and disorder.

Column Description
Values
orphacode Unique identifier of disorder int
expert_link Link to Orphanet page for disase str
disorder_name Most generally accepted name of disorder str
disorder_group Hierarchical level of the clinical entity "Group of disorders"
"Disorder"
"Subtype of disorder"
association_type Gene-disease relationship "Biomarker tested in"
"Candidate gene tested in"
"Disease-causing germline mutation(s) (gain of function) in"
"Disease-causing germline mutation(s) (loss of function) in"
"Disease-causing germline mutation(s) in"
"Disease-causing somatic mutation(s) in"
"Major susceptibility factor in"
"Modifying germline mutation in"
"Part of a fusion gene in"
"Role in the phenotype of"
association_status Gene-disease association status "Validated"
"Not validated"
gene_symbol HGNC-approved gene symbol str
gene_name Full gene name str
gene_type Gene type "gene with protein product"
"Non-coding RNA"
"Disorder-associated locus"
external_references List of references in HGNC, OMIM, GenAtlas and UniProtKB, Ensembl, Reactome and IU-PHAR associated with a given gene str
source_of_validation Listed reference for a given source associated with a gene str

Associated phenotypes

One row represents one disorder/phenotype pair. A disorder may have multiple associated phenotypes.

Column Description
Values
orphacode Unique identifier of disorder int
expert_link Link to Orphanet page for disase str
disorder_name Most generally accepted name of disorder str
disorder_group Hierarchical level of the clinical entity. "Group of disorders"
"Disorder"
"Subtype of disorder"
hpo_id Unique identifying number assigned by HPO to a given phenotype str
hpo_term Preferred name of HPO phenotype str
hpo_frequency Estimated frequency of phenotype within disorder "Obligate (100%)"
"Very frequent (99-80%)"
"Frequent (79-30%)"
"Occasional (29-5%)"
"Very rare (<4-1%)"
"Excluded (0%)"
diagnostic_criteria Indicator of phenotype being a pathognomonic sign or a diagnostic criterion in disorder "Diagnostic criterion"
"Pathognomonic sign"
source Reference str

Functional consequences

One row represents one disorder/functional consequence pair. A disorder may have multiple functional consequences.

Column Description
Values
orphacode Unique identifier of disorder int
expert_link Link to Orphanet page for disase str
disorder_name Most generally accepted name of disorder str
disorder_group Hierarchical level of the clinical entity. "Group of disorders"
"Disorder"
"Subtype of disorder"
disability Name of disability str
disability_category Category of disability “Activity limitation/participation restriction”
“No functional disability”
“Not applicable”
reason_for_not_applicable If category is not applicable, the identified reason “Hypervariable functioning”
“Early death-causing disease”
“Not applicable for another reason”
frequence_disability Frequency of the functional consequence in the given population "very frequent"
"frequent"
"occasional"
temporality_disability Temporality of the functional consequence in the given population “permanent limitation/restriction”
“transient limitation/restriction”
“delayed acquisition”
severity_disability Severity of the functional consequence in the given population “low”
“moderate”
“severe”
“complete”
“Unspecified”
loss_of_ability Defined as a progressive and definitive loss of a skill or ability over the course of the disease "yes"
"no"
type Disability (functional consequence) or environmental factor "Disability"
"Environmental factor"
defined Indicator for severity, temporality, and frequency being defined "y"
"n"
source_of_validation Source of validation of the given clinical entity’s annotation str
specific_management If specific management protocol is known for the given disease, this field will indicate “y” for yes and all the annotations will have been conducted considering this specific management protocol. "y"
"n"
annotation_date Date of annotation str
status_disability Status of the validation of the given clinical entity’s annotation "Validated"
"Not validated"