VarSome API

Introduction

The VarSome API allows developers to easily retrieve information from 100+ genomic databases in a single call. The data is returned using Json, which is easily accessible from any language or platform. For example, it can be easily transformed to native Python objects if required for processing.

This is the same data as is visualised in the VarSome genomic search engine, and leverages Saphetor's proprietary high-performance genomic database.

Batch requests are also available, allowing data for up to 10,000 variants per batch query (we recommend 1,000 for optimal results) to be efficiently retrieved in a single call.

Please ensure you have registered as a VarSome user, then contact us with your user login in order to receive an authentication token to use the API.

Your feedback will be much appreciated.

Example Queries

Here are a few simple examples queries:

Substitution: https://api.varsome.com/lookup/15-73027478-T-C?add-ACMG-annotation=1
Deletion: https://api.varsome.com/lookup/chr22-29091857-G-?add-ACMG-annotation=1
Insertion: https://api.varsome.com/lookup/7-151945072--T?add-ACMG-annotation=1

All the annotation data is returned as a single dictionary, keyed by each source institution & database name. Example keys are "icgc_somatic", "iarc_tp53_germline" and "ncbi_clinvar".

Variant endpoints

Retrieve variant related data.

Schema Lookup request

[GET] [https://api.varsome.com/lookup/schema]

Retrieves the schema of a variant response object, containing relevant information for each field included in the variant lookup response.

Example

https://api.varsome.com/lookup/schema

Germline annotation fields

For full documentation see Documentation

acmg_annotation - The json object containing the germline annotation.

version_name - String. VarSome's software version.
gene_symbol - String. The gene symbol of the variant.
transcript - String. The transcript used for the acmg_annotation.
transcript_reason - String. The reason for selecting the specific transcript. If the override_transcript isn't used, the transcript selected by default is the one with the most severe coding impact, or the longest canonical transcript, or the longest one
transcript_candidates - Array. The transcript candidates.
- canonical - Boolean. The canonical flag.
- gene_id - Integer. The gene identification number.
- gene_symbol - String. The gene symbol.
- is_splicing - Boolean. The is splicing flag.
- name - String. The transcript candidate named.
- coding_impact - String, The coding impact.
- total_coding_length - Integer. The total coding length.
- total_exon_length - Integer. The total exon length.
- user_specifier - Boolean. The user specifier.
- gene_transcript - String. The gene transcript.
coding_impact - String. The variant's coding impact based on the aforementioned transcript.
verdict - Json object containing the overall results of the germline classification.
- ACMG_rules - Json object containing a summary.
  - approx_score - Integer. A numeric score that is used for sorting. Not used for deriving the verdict.
  - benign_score - Integer. The sum of benign rules individual scores.
  - benign_subscore - String. The verdict derived using only benign rules.
  - clinical_score - Float. The score of Germline classification.
  - pathogenic_score - Integer. The sum of pathogenic rules individual scores.
  - pathogenic_subscore - String. The verdict derived using only pathogenic rules.
  - total_score - Integer. Total is the final score of Germline, calculated by subtracting the benign_score from the pathogenic_score.
  - verdict - String. The Germline classification based on the annotation data, obtained by combining the pathogenic & benign sub-scores per the ACMG guidelines.
- classifications - Array. The Germline rules that succeeded.
classifications - Array. The rules that succeeded, as well as the ones that failed but have a clear user explanation.
- name - String, The Germline rule's name.
- met_criteria - Boolean, true for success, false for failure.
- user_explain - Array. The user explanations for a rule that succeeded.
- user_explain_failed - Array. The user explanations for a rule that failed.
gene_id - Integer. Internal gene identification number.
sample_findings - Json object containing the overall results of the findings for the specified sample.
- inheritance - String. The inheritance.
- phenotypes - String. The phenotypes.

Saphetor Known Pathogenicity fields

saphetor_known_pathogenicity - The array containing the Saphetor DBs.

version - String. VarSome's software version.
items - Array object containing all the existing details about Saphetor Known Pathogenicity.
- annotations - Json object containing all the Saphetor DB details.
  - NCBI ClinVar2 - Array, The NCBI ClinVar2 details.
    - review_status - String, The review status.
    - submission_count - Integer, The number of submissions.
    - review_stars - Integer. The number of review stars.
    - accession_count - Integer. Number of accessions.
    - publication_count - Integer, Number of publications.
    - clinical_significance - Array. The clinical significance provided by ClinVar.
    - pub_med_references - Array. PubMed IDs included in the entry.
    - possible_functional_studies - Array, The possible functional studies.
    - functions - Array. The functions.
    - coding_impact - String. The Coding impact.
    - acmg_confirmed - Boolean. If the clinical significance provided by ClinVar, matches with the verdict of our Germline classifier.
    - acmg_class - String. The clinical significance provided by ClinVar converted into a Germline verdict.
    - acmg_reannotated - String. The verdict of our Germline classifier for this variant.
    - codon - Integer, The codon.
    - gene_symbol - String, The gene symbol.
    - hgvs - String, The HGVS.
    - transcript - String, The transcript.
    - disease_name - Array, The disease names.
  - UNIPROT UniProt Variants - Array, The UNIPROT UniProt Variants details.
    - possible_functional_studies - Array, The possible functional studies.
    - disease_name - Array, The disease names.
    - disease_symbol - Array. The disease symbols.
    - annotation_id - String. The annotation id.
    - variant_type - String, The variant type.
    - disease - String, The disease.
    - pub_med_references - Array. PubMed IDs included in the entry.
    - functions - Array. The functions.
    - coding_impact - String. The Coding impact.
    - acmg_confirmed - Boolean. If the clinical significance provided by ClinVar, matches with the verdict of our Germline classifier.
    - acmg_class - String. The clinical significance provided by ClinVar converted into a Germline verdict.
    - acmg_reannotated - String. The verdict of our Germline classifier for this variant.
    - codon - Integer, The codon.
    - gene_symbol - String, The gene symbol.
    - hgvs - String, The HGVS.
    - transcript - String, The transcript.
  - Saphetor PubMedUserEntry - Array, The Saphetor PubMedUserEntry details.
    - pathogenicity - Array, The pathogenicities.
    - id - Integer, The id.
    - confirmedByFunctionalStudy - Boolean. Whether the user entry is confirmed by a functional study.
    - is_lifted_over - Boolean. Whether the entry is an automatic lift over from another genome.
    - lifted_from - String, Lifted from information.
    - pub_med_references - Array. PubMed IDs included in the entry.
    - functions - Array. The functions.
    - coding_impact - String. The Coding impact.
    - acmg_confirmed - Boolean. If the clinical significance provided by ClinVar, matches with the verdict of our Germline classifier.
    - acmg_class - String. The clinical significance provided by ClinVar converted into a Germline verdict.
    - acmg_reannotated - String. The verdict of our Germline classifier for this variant.
    - codon - Integer, The codon.
    - gene_symbol - String, The gene symbol.
    - hgvs - String, The HGVS.
    - transcript - String, The transcript.
  - Saphetor VarSome Comment - Array, The Saphetor VarSome Comment details.
    - comment - String, The comment.
    - flagged_at_timestamp - String, The flagged at timestamp.
    - id - Integer, The id.
    - saphetorClass - String. The Saphetor class.
    - user_id - Integer. The user id.
    - variant_id - Integer, The variant id.
    - functions - Array. The functions.
    - coding_impact - String. The Coding impact.
    - acmg_confirmed - Boolean. If the clinical significance provided by ClinVar, matches with the verdict of our Germline classifier.
    - acmg_class - String. The clinical significance provided by ClinVar converted into a Germline verdict.
    - acmg_reannotated - String. The verdict of our Germline classifier for this variant.
    - is_lifted_over - Boolean. Whether the entry is an automatic lift over from another genome.
    - lifted_from - String, Lifted from information.
    - codon - Integer, The codon.
    - gene_symbol - String, The gene symbol.
    - hgvs - String, The HGVS.
    - transcript - String, The transcript.
  - CHOP Mitomap - Array, The CHOP Mitomap details.
    - diseases - Array, The diseases.
    - possible_functional_studies - Array, The possible functional studies.
    - pub_med_references - Array. PubMed IDs included in the entry.
    - functions - Array. The functions.
    - coding_impact - String. The Coding impact.
    - acmg_confirmed - Boolean. If the clinical significance provided by ClinVar, matches with the verdict of our Germline classifier.
    - acmg_class - String. The clinical significance provided by ClinVar converted into a Germline verdict.
    - acmg_reannotated - String. The verdict of our Germline classifier for this variant.
    - codon - Integer, The codon.
    - gene_symbol - String, The gene symbol.
    - hgvs - String, The HGVS.
    - transcript - String, The transcript.
  - Saphetor VarSome AI Variant - Array, The Saphetor VarSome AI Variant details.
    - original_variant - String, The original variant.
    - pub_med_references - Array. PubMed IDs included in the entry.
    - functions - Array. The functions.
    - coding_impact - String. The Coding impact.
    - acmg_confirmed - Boolean. If the clinical significance provided by ClinVar, matches with the verdict of our Germline classifier.
    - acmg_class - String. The clinical significance provided by ClinVar converted into a Germline verdict.
    - acmg_reannotated - String. The verdict of our Germline classifier for this variant.
    - codon - Integer, The codon.
    - gene_symbol - String, The gene symbol.
    - hgvs - String, The HGVS.
    - transcript - String, The transcript.

Somatic annotation fields

For full documentation see Documentation

amp_annotation - The json object containing the somatic annotation.

version_name - String. VarSome's software version.
verdict - Json object containing the overall results of the amp classification.
- tier - String. The Somatic classification (Tier I - Tier IV) based on the annotation data, obtained by combining the pathogenic & benign sub-scores per the guidelines.
- approx_score - Double. A numeric score that is used for sorting. Not used for deriving the verdict.
classifications - Array. The rules that succeeded, as well as the ones that failed but have a clear user explanation.
- name - String, The Somatic rule's name.
- tier - The tier assigned to the rule (Tier I - Tier IV).
- user_explain - Json object containing the user explanations for a rule that succeeded.
  - Tier I - Array.
  - Tier II - Array.
  - Tier III - Array.
  - Tier IV - Array.
- user_explain_failed - Array. The user explanations for a rule that failed.
- total_samples - Integer. The total samples found for the specific variant. Only in the Somatic Rule.
- approx_score - Float. The approximately score.
sample_findings - Json object containing the overall results of the findings for the specified sample.
- sex - String. The sample findings for the specified sex.
- age - String. The sample findings across for the specified age.
- age_match - String. The sample findings across for the specified age.
- tissue_type_match - Array. The sample findings for the specified tissue type.
- cancer_type_match - Array. The sample findings for the specified cancer type.
- ethnic_frequency - String. The sample findings for the specified ethnic frequency.
- inheritance - String. The inheritance.
approved_therapies - Array of objects containing the approved therapies.
- approval_status - String. The approval status.
- evidence_type - String. The evidence type.
- efficacy_evidence - String. efficacy evidence.
- response_type - String. The response type.
- amp_tier - String. The amp tier.
- cap_level - String. The cap level.
- therapy - String. The therapy.
- therapy_id - Integer. The therapy id.
- normalized_drug_name - String. The normalized drug name.
- indication - String. The indication.
- normalized_cancer - String. The normalized cancer.
- pub_med_references - Array. The PubMed references.
- molecular_profile - String. The molecular profile.
- approved_authorities - Array. The approved authorities.
- drugs - Json objects containing the Drugs.
  - drug_name - String. The drug name.
  - id - Integer. The Drug ID.
  - normalized_drug - String. The normalized drug.
- therapy_descriptions - Json objects containing the therapy descriptions.
  - description - String. The description.
  - pub_med_references - Array. The PubMed references.

Variant lookup

[GET] [https://api.varsome.com/lookup/{query}/{ref_genome}{?add-ACMG-annotation=1&add-varsome-user-entries=1&expand-pubmed-articles=1&add-region-databases=1&add-source-databases}]

The query parameter can be any of the following: HGVS Protein-level variant, HGVS DNA-level variant, rs_id, 4-part genomic variant specification and variant_id.

The response to all these types of queries has the same format and can be either an Array of variant response objects, or a single variant response object.

Parameters

query - A String representation of the variant to query. Can be any of the following:
- HGVS Protein-level variant - Gene/transcript name followed by HGVS Protein level notation. Examples: BRAF:V600E, NM_001252678:I182T
- HGVS DNA-level variant - Gene/transcript name followed by HGVS DNA level notation. Example: FTO:c.46-43098T>C
- rs_id - the dbSNP accession number. String “rs” followed by one or more digits. Example: rs113488022
- 4-part genomic variant specification- chromosome:position:reference_allele:alternate_allele or chromosome:position:reference_length:alternate_allele. The separator may be ‘:' or ‘-‘, the chromosome number is optionally preceded by the string “chr”, and position is the 1-based chromosomal position.
- variant_id - our 20-digit integer value (example '10190150730273780002’). If you call “region_variants” below, it is faster to then obtain variant data using the variant_ids returned.
ref_genome(optional) - `hg19` or `hg38` Default: `hg19`
add-all-data (optional) - Can be 0 or 1. Include all data in the annotation(same as enabling all the parameters below)
add-ACMG-annotation (optional) - Can be 0 or 1. Include Germline classification (only available in specific API plans).
minimum-clinvar-stars (optional) - Can be 0 to 4. Define the minimum ClinVar rating to take into account when calculating the Germline Verdict.
expand-pubmed-articles (optional) - Can be 0 or 1. Include publication information (e.g. authors, abstract) for every PUBMED ID in the annotation result
add-region-databases (optional) - Can be 0 or 1. Include region databases data in the response
add-source-databases (optional) - 'all' or 'none' or an array of database names to be included in the result
override_transcript (optional) - Overrides the transcript used for germline annotation with the one specified by override_transcript. The transcript used by default is the one with the most severe coding impact, or the longest canonical transcript, or the longest one. Requires that add-ACMG-annotation parameter is set to 1. If the transcript defined for that variant isn't valid, the api returns an error.
add-AMP-annotation (optional) - Can be 0 or 1. Enables the somatic annotation.
sex (optional) - String. Male or Female. The sample's sex to be used for matching. Requires that the somatic annotation is enabled.
age (optional) - Integer. The sample's age to be used for matching. Requires that the somatic annotation is enabled.
ethnicity (optional) - String. The sample's ethnicity to be used for matching. Requires that the somatic annotation is enabled.
cancer-type (optional) - String. The sample's cancer type to be used for matching. Requires that the somatic annotation is enabled.
tissue-type (optional) - String. The sample's tissue type to be used for matching. Requires that the somatic annotation is enabled.

Headers

Authorization (optional) - To take advantage of your account's benefits you may optionally include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Examples

Annotate a variant using Germline annotation https://api.varsome.com/lookup/15-73027478-T-C?add-ACMG-annotation=1
Annotate a variant using Germline annotation and override transcript https://api.varsome.com/lookup/15-73027478-T-C?add-ACMG-annotation=1&override_transcript=ENST00000542334
Annotate a variant using Somatic annotation https://api.varsome.com/lookup/15-73027478-T-C?add-AMP-annotation=1
Annotate a variant using Somatic annotation specifying sample's sex and age https://api.varsome.com/lookup/BRAF:V600E?add-AMP-annotation=1&add-ACMG-annotation=1&sex=male&age=47
Annotate a variant with neither Germline nor Somatic annotation https://api.varsome.com/lookup/chr19:20082943:1:G
Annotate including the varsome submitted publications https://api.varsome.com/lookup/rs113488022/hg38?add-varsome-user-entries=1
Annotate including the selected databases (clinvar and cancer hotspots in this example) https://api.varsome.com/lookup/BRAF:V600E?add-source-databases=ncbi-clinvar2,cancer-hotspots
Annotate a variant with data from all possible databases - this is potentially onerous as it hugely increases the amount of data returned.https://api.varsome.com/lookup/TP53:R175L?add-all-data=1
Other cancer examples:

Batch Lookup for many variants

[POST] [https://api.varsome.com/lookup/batch/{ref_genome}{?add-ACMG-annotation=1&add-varsome-user-entries=1&expand-pubmed-articles=1&add-region-databases=1&add-source-databases=all}]

Retrieve variant data for more than one variant which are passed in the POST request payload, based on a reference genome id. This is currently limited to 1000 variants per request.

Parameters

ref_genome (optional) - `hg19` or `hg38` Default: `hg19`
add-all-data (optional) - Can be 0 or 1. Include all data in the annotation(same as enabling all the parameters below)
add-varsome-user-entries (optional) - Can be 0 or 1. Include VarSome's user submitted publications in the response
expand-pubmed-articles (optional) - Can be 0 or 1. Include publication information (e.g. authors, abstract) for every PUBMED ID in the annotation result
add-region-databases (optional) - Can be 0 or 1. Include region databases data in the response
add-source-databases (optional) - 'all' or 'none' or an array of database names to be included in the result

Headers

Authorization (optional) - To perform a batch query you need to include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Request body

variants (array) - an Array of strings containing any of the supported variant lookup notations as shown above. Example: `{"variants": ["rs113", "chr22:39777823::CAA"]}`

Get variants in a genomic region

[GET] [https://api.varsome.com/region_variants/{ref_genome}/{chromosome_id}/{position}/{length}{?add-ACMG-annotation=1&add-source-databases=all}]

Retrieve all known variants inside the genomic region described, using the ref_genome, chromosome_id, position and length.

Parameters

ref_genome (optional) - `hg19` or `hg38` Default: `hg19`
chromosome_id - A number representing the chromosome, 1-22, 23 for X and 24 for Y.(example `1`)
position - the 1-based chromosomal position of the start of the region.
length - the length of the region in base pairs
add-all-data (optional) - Can be 0 or 1. Include all databases in the response. Same as add-source-databases=all
add-source-databases (optional) - 'all' or 'none' or an array of database names to be included in the result

Headers

Authorization (optional) - To take advantage of your account's benefits you may optionally include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Example

https://api.varsome.com/region_variants/hg19/17/7577020/100?add-all-data=1

Liftover Variant

[GET] [https://api.varsome.com/lookup/lifted-over-variant/{query}/{ref_genome}]

The query parameter can be any of the following: HGVS Protein-level variant, HGVS DNA-level variant, rs_id, 4-part genomic variant specification and variant_id.

The response to all these types of queries has the same format and it is an Array with variants genome coordinates.

Parameters

query - A String representation of the variant to query. Can be any of the following:
- HGVS Protein-level variant - Gene/transcript name followed by HGVS Protein level notation. Examples: BRAF:V600E, NM_001252678:I182T
- HGVS DNA-level variant - Gene/transcript name followed by HGVS DNA level notation. Example: FTO:c.46-43098T>C
- rs_id - the dbSNP accession number. String “rs” followed by one or more digits. Example: rs113488022
- 4-part genomic variant specification- chromosome:position:reference_allele:alternate_allele or chromosome:position:reference_length:alternate_allele. The separator may be ‘:' or ‘-‘, the chromosome number is optionally preceded by the string “chr”, and position is the 1-based chromosomal position.
- variant_id - our 20-digit integer value (example '10190150730273780002’). If you call “region_variants” below, it is faster to then obtain variant data using the variant_ids returned.
ref_genome(optional) - `hg19` or `hg38` Default: `hg19`

Headers

Authorization (optional) - To take advantage of your account's benefits you may optionally include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Examples

Get a liftover variant https://api.varsome.com/lookup/lifted-over-variant/15-73027478-T-C

Gene endpoints

Retrieve gene related data.

Gene response schema

[GET] [https://api.varsome.com/lookup/schema/genes]

Retrieves the schema of a gene response object, containing relevant information for each field included in the gene lookup response.

Example

https://api.varsome.com/lookup/schema/genes

Gene lookup based on gene symbol

[GET] [https://api.varsome.com/lookup/gene/{gene_symbol}/{ref_genome}{?add-source-databases=all}]

Retrieve gene data for the given 'gene_symbol'. Also based on a reference genome id.

Parameters

gene_symbol - The gene's symbol
ref_genome (optional) - `hg19` or `hg38` Default: `hg19`
add-all-data (optional) - Can be 0 or 1. Include all data in the response. Same as enabling all the following parameters
expand-pubmed-articles (optional) - Can be 0 or 1. Include publication information (e.g. authors, abstract) for every PUBMED ID in the annotation result
add-source-databases (optional) - 'all' or 'none' or an array of database names to be included in the result

Headers

Authorization (optional) - To take advantage of your account's benefits you may optionally include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Examples

Batch Lookup for many genes

[POST] [https://api.varsome.com/lookup/genes/batch{?expand-pubmed-articles=1&add-source-databases=all}]

Retrieve variant data for more than one variant which are passed in the POST request payload, based on a reference genome id. This is currently limited to 1000 variants per request.

Parameters

add-all-data (optional) - Can be 0 or 1. Include all data in the response. Same as enabling all the following parameters
expand-pubmed-articles (optional) - Can be 0 or 1. Include publication information (e.g. authors, abstract) for every PUBMED ID in the annotation result
add-source-databases (optional) - 'all' or 'none' or an array of database names to be included in the result

Headers

Authorization (optional) - To perform a batch query you need to include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Request body

genes (array) - an Array of strings containing any of the supported variant lookup notations as shown above. Example: `genes: ['BRAF', 'TP53']`

Transcript endpoints

Retrieve transcript related data.

Transcript lookup based on transcript name

[GET] [https://api.varsome.com/lookup/transcript/{transcript_name}/{ref_genome}]

Retrieve transcript data for the given transcript name. Also based on a reference genome id.

Parameters

transcript_name - The transcript name
ref_genome (optional) - `hg19` or `hg38` Default: `hg19`

Headers

Authorization (optional) - To take advantage of your account's benefits you may optionally include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Examples

CNV endpoints

Retrieve cnv related data.

Germline(CNV) annotation fields

For full documentation see Documentation

sv_acmg_annotation - The json object containing the germline annotation for CNVs.

verdict - Json object containing the overall results of the Germline classification for CNVs.
- saphetor_class - String. The Germline class as assigned by Saphetor following all optimizations.
- saphetor_score - Float. The numeric score that Saphetor assigned to this CNV.
classifications - Array. The rules that succeeded, as well as the ones that failed but have a clear user explanation.
- name - String, The Germline(CNV) rule's name.
- saphetor_class - String. The Germline class as assigned by Saphetor following all optimizations.
- saphetor_score - Float. The numeric score that Saphetor assigned to this CNV.
- saphetor_user_explain - Array. The user explanations for a rule that succeeded.

CNV lookup

[GET] [https://api.varsome.com/lookup/cnv/{query}/{ref_genome}]

The query parameter can specify either a deletion or a duplication CNV.

The response to these types of queries has the same format.

Parameters

query - A String representation of the CNV to query. It has the following format: {chrN:startPosition:endPositionOrLength:cnvType}
- chrN - The chromosome of the CNV. Example: chr1
- startPosition - The start position of the CNV. Example: 1000
- endPositionOrLength - By default maps to the end position of the CNV. In case the user wants to override this, they can explicitly select the mode. Example: both E1000 and 1000 set the CNV's end position to be 1000, whereas L1000 sets the CNV's length to be equal to 1000.
- cnvType - Can either be DUP for duplication or DEL for deletion. Example: DEL
ref_genome(optional) - `hg19` or `hg38` Default: `hg19`

Headers

Authorization (optional) - To take advantage of your account's benefits you may optionally include your VariantAPI token as a request authorization header. Example: `Authorization: Token <your_token>`

Examples

Annotate a deletion CNV using Germline annotation for hg19. The CNV starts at position 122 and ends at position 5235. https://api.varsome.com/lookup/cnv/chr1:122:5235:DEL/1019
Annotate a duplication CNV using Germline annotation for hg38. The CNV starts at position 200 and has a length of 1254. https://api.varsome.com/lookup/cnv/chr1:100:L1254:DUP/1038