Run ID | MM01 |
---|---|
Sample ID | MM01 |
DNA panel name | whole_genome_500_plex_mm_hotspot-mm81-82 |
DNA panel size | 840 amplicons |
Reference genome | hg19 |
Secondary analysis pipeline version | 3.4.1 |
Tertiary pipeline version | 1.1a10 |
Date analyzed | September 20, 2024 06:38:09 |
Report interpretation
Summary tab
Heatmap and barcharts
Provide visual representations of the number of cells, somatic variants, and average protein expression seen in each clone. You can hover over the figure to get more detailed information. Clones are groups of cells that share common variants, protein signatures, or copy number alterations.
Heatmap of Imputated genotype signature of each clone and sample represents the imputed most likely genotype of a variant within each clone and sample. Please note that this clone-level genotype maybe different from a particular cell's genotypes within this clone.
Multi-sample report
Reports containing two or more samples from the same patient provide additional visualizations. A legend at the top displays the names of the samples in the report and an associated numerical value that are referenced in the report tables. Below the legend is a barchart displaying the number of cells per clone combined from all samples, and under it are additional barcharts for each individual sample.
Fishplot
Reports containing two or more samples from the same patient provide a fishplot visualization. The fishplot shows the evolutionary history and changes in the composition of these clones over time. Each horizontal branch on the fishplot represents a clone, and the thickness of the branches indicates the relative size of the clone. Fishplots allow tracking of the clonal evolution within a patient's cells and gain insights into the progression of the cancer such as emergence of resistance mutations.
Clones table
The table displays detailed information for each clone.
- Samples: The samples the clone is present in. Reference the legend at the top of the report to see which sample corresponds to each number. This column is only shown in multi-sample reports.
- Clone name: The name of the clone. `Wildtype` refers to the wildtype clone with no observed somatic mutations (i.e. normal cells).
- Number of cells: Number of cells observed in the clone. Also shown is the number of cells as a percentage of total cells in the sample.
- Variants: The variants observed in the clone.
- Protein differential expression: The names for each protein whose expression is statistically significantly different in the clone versus the wildtype clone, or versus all other clones combined if no wildtype clone is present. The statistical test used is the two-sided t-test.
Variants table
The table contains detail information for each variant.
- Samples: The samples the mutation is present in. Reference the legend at the top of the report to see which sample corresponds to each number. This column is only shown in multi-sample reports.
- Gene: Gene where the variant is present.
- Protein: The variant's impact on the protein.
- Coding impact: The variant's impact on the coding sequence.
- cDNA: The variant's impact on the cDNA.
- RefSeq transcript ID: The RefSeq transcript ID that the cDNA change is reported for.
- Chromosome: The chromosome the variant is located on.
- Position: The genomic position of the variant.
- Reference allele: The reference allele of the variant.
- Alternate allele: The alternate allele of the variant.
- var_id: ID used to uniquely identify the variant.
- Known Mutation 'Yes' if the variant was detected previously in another sample (e.g. bulk sequencing), which is determined based on variants labeled as “SOMATIC” in the VCF file.
- Cells Mutated (%): Percentage of cells the variant is seen in.
- Average VAF: The average variant allele frequency in the cells containing the variant.
- WT cells (imputed): Number of cells without the variant, after correcting for ADO.
- WT cells (raw): Number of cells without the variant, before correcting for ADO. Wildtype cells are those with a variant allele frequency ≤ 5%, depth of coverage ≥ 10, and genotype quality ≥ 30.
- HET cells (imputed): Number of cells that are heterozygous for the variant, after correcting for ADO.
- HET cells (raw): Number of cells that are heterozygous for the variant, before correcting for ADO. Heterozygous cells are those with a variant allele frequency ≥ 35%, depth of coverage ≥ 10, and genotype quality ≥ 30.
- HOM cells (imputed): Number of cells that are homozygous for the variant, after correcting for ADO.
- HOM cells (raw): Number of cells that are homozygous for the variant, before correcting for ADO. Homozygous cells are those with a variant allele frequency ≥ 95%, depth of coverage ≥ 10, and genotype quality ≥ 30.
- Missing (%): Percentage of cells where there is not sufficient coverage or the genotype quality is too low.
- Average coverage: In the cells containing the variant, this is the average read coverage at the variant's position.
- Variant score: Based on a reference set of control samples, this is the p-value for the probability the variant is an error / false positive.
- Average background error rate (%): Based on a reference set of controls, the average percentage of cells with the mutation.
- Co-occurring variants: Contains a list of other variants that statistically significantly co-occur with this one. However, if the variant could be detected on its own, then nothing will be displayed. The statistical test is based on the one-sided binomial test, which tests if the percentage of cells with both mutations is higher than expected by chance if we assume they occur independently.
Details tab
Phylogenetic tree
The phylogenetic tree tells you the evolutionary history of the cancer by showing the order in which the mutations were acquired and how they co-occur. As for how it’s constructed, it is a computational algorithm that uses DNA information (i.e. variants and CNVs) to determine which cells are more similar to each other, while accounting for errors such as ADO, false positive variants, and variability from sequencing coverage. The algorithm further uses scoring algorithms in such a way that predispose it to find a phylogenetic tree and clonal architecture that is biologically plausible. The end result is the clustering together of cells into clones that share the same zygosity for the variants.
- Node: each node with black circle represents a cell clone containing cells from one or more samples. Inside the node, annotations include {gene name}{mutation}{chromosome}. Multiple mutations listed in a node indicate mutations co-occrred in this cell clone but may or may not be co-occurred in a particular sample nor a particular cell.
- Edges (or branches) connect nodes, representing evolutionary relationships.
- Colored dash line and circles provide the information of cell number and percentage for each clone.
Protein UMAP
UMAP is a dimensionality reduction technique commonly used in data analysis and visualization, especially for single-cell data. Each point represents a single cell, and the proximity of points on the UMAP plot reflects the similarity between cells based on their immunophenotype profiles. Cells that are close to each other on the UMAP plot have similar immunophenotypes, while cells that are farther apart have distinct profiles. The UMAP plot can reveal clusters or groups of cells with similar immunophenotypes. These clusters might correspond to different cell types or subpopulations within your sample. By examining the distribution and arrangement of cells on the UMAP plot, you can gain insights into the heterogeneity and structure of your single-cell data.
Protein expression correlation
The protein expression correlation plot shows the correlation in expression for two proteins. Each point is a different cell and their position along the x or y axis indicates the magnitude of expression for the respective proteins.
Protein expression change over time
Time course analysis reports containing two or more samples from the same patient provide a visualization to show how protein expression changes over time for each clone. The dropdown menu on the left is used to select which clone to visualize. The dropdown menu on the right is used to select which protein AOC to visualize. The x axis displays the samples ordered by timepoint and the y axis displays the protein expression.
Copy number analysis
Two figures are provided to visualize copy number profiles for each clone:
- Top figure: Shows the average ploidy for each amplicon in each clone. Amplicons can be grouped by chromosome or gene. The ploidy value is calculated by normalizing the amplicon coverage values by the median coverage of the corresponding amplicons in the wildtype clone. If no wildtype clone is present, then the normalization is done with respect to all cells in the whole sample.
- Bottom figure: Shows the average variant allele frequency (VAF) in each clone for germline variants that are located in the corresponding locations shown in the top figure. The purpose of the visualization is to provide a sanity check when reviewing CNAs. For example, if a heterozygous germline variant is on a chromosome that is deleted, then the VAF of the germline variant will decrease to 0. However, if the chromosome instead gains a copy, then the VAF of the germline variant will increase to 0.66 (i.e. two chromosomal copies with the germline variant and one without). CNAs that overlap heterozygous germline variants are expected to be accompanied by changes in the VAFs of those germline variants.
Caveat: CNA plots are much noisier for clones with very few cells.
QC tab
Germline variant / multiplexing diagnostic plot
Provides a visual representation of germline variant information to help confirm and diagnose sample identity issues. The figure compares the germline genotype values seen in the Tapestri data with the expected germline genotypes, which were provided via a VCF file. This diagnostic plot is mainly useful for multiplexed Tapestri runs.
Heatmap of somatic variants (raw)
Provides a visual representation of the genotypes (WT, HET, HOM, or Missing) for each somatic variant in each cell. The genotypes are from before correcting for ADO but after filtering false positives (variant allele frequency > 5%, depth of coverage ≥ 10, and genotype quality ≥ 30).
Heatmap of protein expression
Provides a visual representation of the expression values for each protein in each cell.
Gene | Protein change | Coding impact | cDNA | RefSeq transcript ID | Chromosome | Position | Reference allele | Alternate allele | Variant ID | Known Mutation | Cells Mutated (%) | Average VAF (%) | Wildtype cells (imputed) | Heterozygous cells (imputed) | Homozygous cells (imputed) | Wildtype cells (raw) | Heterozygous cells (raw) | Homozygous cells (raw) | Missing (%) | Average coverage | Allele Freq (gnomAD) | dbSNP | Variant score | Average background error rate (%) | Co-occurring variants |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NRAS | p.Q61H | missense | c.183A>T | NM_002524.5 | chr1 | chr1:115256528:T/A | No | 94.79 | 99 | 137 | 0 | 2494 | 68 | 10 | 1931 | 23.64 | 36 | rs121913255 | 1.00e-08 | 0.037 |