A bit of background...
The rise of reported vancomycin-resistant Enterococcus faecium (VRE) infections worldwide highlights the importance of detecting outbreaks in a timely fashion. Whole-genome sequencing data can be useful to understand dissemination modes of vancomycin-resistant genes. For this reason, we have built ClonalTracker to compare any two VRE genomes. The ClonalTracker can be used from the command line but we have also designed the web server so that any researcher with or without bioinformatic experience can use it.
How to run ClonalTracker
Input files
ClonalTracker uses as input two assembly files in FASTA format for any two given VRE isolates. It is important to note that the quality of the assembly directly influences the tool performance as it depends on how (well) the van-containing transposon is reconstructed. Also, be aware that using genomes assembled with different tools might interfere with the end result.
Important note: the accuracy of the results directly depend on the quality of the inputted assemblies. Please, assess the contiguity, completeness, and correctness of your assembly files. You can read more on how to assess the quality of genome assemblies here. For guidance, 124 assemblies have been used to test ClonalTracker with a mean N50 value of 43.86 and an N75 of 22.88.
Workflow
1. van-typing using Blastn
BLAST (basic local alignment search tool) is an algorithm that aligns any given sequence (query) to a database of nucleotide sequences in this case. The database ClonalTracker uses is composed of 6 different van operons including vanA and vanB type. As query, ClonalTracker uses all the contigs/scaffolds available from the isolate's assemblies. The van type is assigned based on the highest similarity hit, if the isolate is vancomycin-resistant. The next step will be performed only when the two isolates are of the same van type. You can read more about BLAST here.
2. Transposon-typing using Blastn, ISEScan, RagTag, Clinker and Blastn
- Blastn: tool that is used to gather the contigs that belong to the transposon by using either one of the reference transposon sequences M97297 and AY655721.2 for vanA and vanB respectively.
- RagTag is a collection of software tools for scaffolding and improving modern genome assemblies. ClonalTracker uses the scaffold function to bridge all the contigs that belong to the van transposon reported by Blastn using the reference transposon mentioned above as templates. If the van transposon is already in one scaffold this step is skipped. You can read more about RagTag here.
- ISEScan is used to detect Insertion Sequences (IS), commonly found in transposons. ISEScan is designed to automate identification of IS elements in prokaryotic genomes based on hidden Markov models. The predicted ISs are considered for the transposon identity. ISEScan also predicts the proteome from the nucleotide sequence given, that is further used in the next steps for transposon comparison. You can read more about ISEScan here.
- Bakta is a tool for the rapid and standardized annotation of bacterial genomes and plasmids from both isolates and MAGs. In this case, it is used to annotate the transposon sequence and generate the GenBank files required as input for the next step. You can read more about Bakta here.
- Clinker is used for visualization purposes mainly. Given a set of GenBank files of at least two gene clusters, this tool will automatically extract protein translations and perform global alignments between sequences in each transposon and create an interactive visualization. Besides this purpose, the number of proteins predicted and the alginment scores are also considered by ClonalTracker to assess transposon similarity. You can read more about Clinker here.
- Blastn is used again to assess the synteny between genes from both transposons. For that, one of the transposons is used as query and the other one as reference. Only two transposons are considered to be identical if they have a similarity score of 100% and the whole sequence is included aligment.
3. Whole genome comparison with PopPUNK and Mash
- PopPUNK: a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. To determine the core and accessory distance between isolates it uses a k-mer (substrings of length k contained within a biological sequence) approach. You can read more about PopPUNK here.
- Mash is a tool to estimate genomes and metagenomes distances also using k-mers. You can read more about Mash here.