The Tapestri Pipeline uses Bluebee’s High Performance Genomics Platform and allows customers to process single-cell DNA sequencing data generated on the Tapestri™ Platform. This document describes how to set up an account and get started using the platform.
Tapestri Pipeline: From Fastq to variant calls
This pipeline processes single-cell DNA-seq data generated with the Mission Bio Tapestri™ Genomics Platform.
Key Features Of Tapestri Pipeline
- Streamlined data analysis from FASTQ files to annotated VCF files.
- Automatically annotate VCF files with variant ID numbers for common and pathogenic variants.
- Create multiple data output files that seamlessly plug into Tapestri Insights or may be analyzed with 3rd party software packages.
Main Input/Output Files
- FASTQ files in compressed format (fastq.gz) generated from Illumina MiSeq, Hiseq 2500, Hiseq 4000, or NovaSeq 6000 runs.
- Custom Panel file in compressed format that provides a user-defined set of targeted positions (e.g., AML panel).
- LOOM file – used in Tapestri Insights
- VCF in compressed format (vcf.gz) – used with 3rd party software platforms
- Report file in PDF format – summarizes sequencing metrics and outputs total cell number
Overview Of The Workflow
Creating Your Account
Mission Bio will create a new account on Bluebee free of charge. Once the account has been created you will receive an email to set up your account.
If you have not received a Confirmation email, please contact Mission Bio Support – firstname.lastname@example.org
Complete the following steps to create an account:
1. Verify your email address by clicking the ‘VERIFY’ button in the email from Bluebee (email@example.com)
2. Set a password to activate your account.
Once the account has been verified you will receive a confirmation email that your account has been activated.
The Tapestri Pipeline requires the following data folder/input nomenclature:
- The input directories cannot contain any spaces.
- The user specified run prefix cannot contain ‘_R1’
- The input file names (in the directories) must also not contain spaces, and they must be in the following format:
- < prefix > _R1_ < suffix >.< extensions > for the forward FASTQ
- < prefix > _R2_ < suffix >.< extensions > for the reverse FASTQ
The underscores around R1 and R2 are mandatory.
- < prefix > and < suffix > can be anything, but they must match between the two files that compose a pair:
e.g., a correct pair is: aaa_R1_001.fastq.gz, aaa_R2_001.fastq.gz
e.g., a wrong pair is: aaa_R1_001.fastq.gz, aaa_R2_002.fastq.gz
- < extensions > are usually “.fastq.gz”
As long as the connector is running and connected, data transfers will continue. You don’t need to stay logged in to the platform for this.
There is a limit to the availability of the uploaded data, pipeline input data and pipeline output data. You can check the status of your files on the Data page within the project. The next statuses can be displayed:
|Available:||This file can be used for pipeline runs and can be downloaded.|
|Deleted:||This file is deleted and cannot be used anymore. Deleted data isn’t shown in the list by default. You can display these files in the list by ticking the box ‘Show deleted data’ on top.|
|Archived:||This file has been archived and needs to be restored before it can be used for runs or downloading. You can do this with the button on top.|
|Partial:||This file is not completed yet. A part of it still needs to be uploaded.|
|Corrupted:||The file cannot be used. It was declared corrupt after being inconsistent or the upload has been aborted by the user. The file cannot be used. Try to upload this file again with another name, e.g. add a letter to the file name.|
|Inconsistent:||An issue occurred during the upload. The file cannot be used. Try to upload this file again with another name, e.g. add a letter to the file name.|
The storage terms are agreed upon in a contract. If no contract is present or you don’t have access to the contract, you can contact support to ask the storage terms that are defined in your case. There are 3 types of storage terms:
|Hot storage:||Contains data that is available.|
|Cold storage:||Contains data that is archived.|
|Grace period strategy:||What happens with the uploaded data after the Grace period has past and when it hasn’t been used in a pipeline run. This can be a deletion, archiving to cold storage or keeping the file on hot storage. Archiving and keeping can be charged. The grace period and its strategy can be checked in the storage bundle on the Activation Codes page.|
Please contact firstname.lastname@example.org for additional information. Also use this address to report any bugs or provide feedback.
First Time Using Your Account
|1. Go to https://tapestri.bluebee.com/missionbio-tapestri/ and log in to your account.|
|Once logged in you will be directed to the ABOUT page that summarizes the key steps when working with the platform (Getting Started Guide):
Set Up Connector
The Bluebee Connector is a small piece of software that allows you to upload data secured to and download data secured from the platform.
| 1. Go to SETTINGS in the top menu and click on SETTINGS in the left navigation bar.
|The top section shows a summary of the personal settings. Optionally one may change the page that is shown after logging in to the account via the Initial view after login dropdown menu (default = About help view).|
|2. Click on NEW to create a new Connector.|
|3. Fill out the fields and click SAVE on the top of the page.|
|The UPLOAD FOLDER FOR PANEL ZIP FILES and DOWNLOAD FOLDER are optional fields.|
1. After clicking SAVE the Bluebee Service Connector will be downloaded and the Initialisation Key will be displayed. Copy the key.
|2. Once downloaded, start the installation, enter the Initialisation Key when prompted, and follow the instructions to complete the installation.|
On the SETTINGS page the connector is now listed and connected.
The installer registers the Bluebee Service Connector as a Windows service and starts the service immediately. Wait ~ 1 minute and then refresh the screen on the SETTINGS page with the REFRESH button on the top right corner.
You can only install one connector at a time on Windows. If you need to install a new connector, first uninstall the old one and delete all associated files (log files?). A connector can be upgraded with the UPDATE TO LATEST VERSION button on the SETTINGS page.
The installer registers the Bluebee Service Connector in your login items so that it automatically starts when you log in. To start it, you can either double-click it in your Launchpad, double-click on the Bluebee Service Connector icon in the directory or log out and log back in. Wait ~ 1 minute and then refresh the screen on the SETTINGS page with the REFRESH button on the top right corner.
Java needs to be installed (at least Java 8). The connector will use the default Java version on the operating system. During the installation process the connector puts a management script in the /etc/init.d. You can start it by executing ‘service bluebeeserviceconnector start’. If no script was generated, start the script from the bsc folder ./bluebeeserviceconnector start.
Before processing sequencing data, individual FASTQ.GZ files that belong to one biological sample need to be linked to samples.
|1. Go to FILES in the top menu.|
|2. Select all FASTQ.GZ files that belong to one biological sample.|
|3. Click CREATE SAMPLE to associate all selected files to one sample.|
|4. Define a sample name. The prefix of the FASTQ.GZ files will be used automatically to name the sample. This label will be prefixed to each output file.|
|5. Select a Panel (e.g., AML).|
|6. Determine number of sequencing runs.
Use TWO RUNS only for the following two situations (file names across both ‘runs’ need to be identical):
|7. Optionally provide a description.|
|8. Assign FASTQ.GZ files to the FORWARD and REVERSE sections. Matching read pairs need to be aligned in the same row. In most cases FASTQ.GZ files are automatically assigned in the correct format.|
|9. Click LINK AS SAMPLE in the top right corner.|
|1. Go to SAMPLES in the top menu.|
|2. Select the sample you want to analyze and click ANALYZE to start the pipeline.|
|Once the analysis is started the run status will show IN PROGRESS. Once the analysis is successfully completed the run status will show SUCCEEDED and you will be notified via email.|
Review And Download Data
|1. Go to SAMPLES in the top menu and highlight the sample you want to review data for.|
2. The ANALYSIS DETAILS page includes a list of key performance metrics on the left side and generated data that can be downloaded on the right side.
Small-size files may be viewed and/or downloaded directly inside the web browser. Large-size files, such as loom files can be downloaded via the connector by clicking the download icon on the far right.
|cells.vcf.gz||Compressed annotated VCF file. Conforms to standard GATK format.|
|cells.loom||Customized annotated VCF file in LOOM file format. To be used with Tapestri Insights.|
|cells.R||Customized annotated VCF file in R file format.|
|flt3.itd.report.txt||For each cell a report detailing the FLT3-ITD variants found by a customized algorithm.|
|report.pdf||A report file summarizing key performance metrics (i.e., number of cells). Summary data also provided on a per-tube basis (i.e., eight tubes)|
|allele.drop.out.report.txt||A report file that summarizes allele dropout metrics using allele dropout (ADO) amplicon information.|
|barcode.cell.distribution.tsv||A spreadsheet file reporting the number of forward reads assigned to each amplicon for each cell found.|
|tapestri_log.txt||A detailed report on the individual steps performed by the Tapestri Pipeline.|
|cells.bam||Individual BAM files with reads of all cells found per tube.|
|mapped.bam||Individual BAM files with all reads mapped to the genome per tube [only reads with valid barcodes are reported].|