invertiaDB Help Page

This is the homepage of invertiaDB. It is comprised of high level metadata about the reasearch. The graphs of homepage are: A bar chart of the inverted repeats density per domain , Three line chart that showcase the distribution of lengths for the arms, spacers and whole sequences present on the dataset and an interactive doughnut chart that graphs the top 10 most dense organisms in inverted repeats per biological domain:

From here you can navigate to other pages of invertiaDB using the navigation bar.

One of the core features of invertiaDB is the ability to search ( Explore -> Invertia Dataset ) via various NCBI assembly genome fields as well as inverted repeats metadata and filter out assemblies that you deem relevant for further analysis.

An example workflow would be to:
1. Quick search for the term "Influenza". This yields rows that come from the Bacterial and Viral domain. So we must move on to advanced filtering
2. Perform advanced filtering to keep only the Viral rows through "Domain contains Virus" filter for organisms that contain the word "Influenza" in their organism_name
3. Then we must keep only rows with IRs through the filter: "inverted_repeat_count greater than 0"
4. Finally we can select all rows and download the files for custom analysis or analyze the genomes through invertia by clicking the blue button with their assembly_id.

We can then click on the accession id button of a specific row and navigate to the analysis page of this specific accession. Here we can see calculations on inverted repeats and accession metadata with cross references to other public sources

We can also see graphs about the distribution on inverted repeat arm , spacer and sequence length and nucleotide composition charts for the same sequences as well:

We can then inspect in tabular format the inverted repeats file as it is present on the dataset:

Finally if the genome was annotated through an NCBI gff file. We can use two gene related features
1. To search Inverted repeats that overlap with a gene based on its locus-tag ( Unique identifier ) 2. To inspect the top 10 most dense genes in Inverted Repeats

Through explore -> Domain we can perform an aggregated analysis per domain:

The aggregated analysis looks like these and there is the option to download the dataset in csv or xlsx format. Also the graphs regarding Domain Inverted Repeat Graphs and Data Distributions of Length per IR Subcompartment are presented:

A more granular way to see the same statistics, meaning an aggregated analysis of multiple assemblies, if the an organism has more than 1, can be performed by the navbar option Explore -> Organisms:

When clicking an organism that has multiple assemblies we are presented with an aggregated view of them in terms of inverted repeats. We also have the ability to download and inspect specific assemblies the same as the Invertia Dataset page. Similarly, Data Distributions of Length per IR Subcompartment are presented.

InvertiaDB also offers the ability to search for specific DNA motfis in the inverted repeat dataset through the navigation bar option Motif Search. A trivial example use case that showcases this feature is to search for a sequence of interest, e.g. "caatatggaa" on the left arm of all inverted repeats:

The primary search leads to a secondary search to find the files and the unique organisms in which these motifs were found to perform further analysis:

Apart from these use cases the motif search can be ran on various more advanced ways such as searching both arms , searching the reverse strand , both strands as well as apply text search functionalities such as sequence contains X , starts with X and length comparisons such as "length equals X" or "length greater than X" in complex ways while also having the ability to limit results. An example would look like:

There is also an about/contact page that explains the basics of Inverted Repeats along with the lab's researchers emails to contact if you have any feedback or questions.

By navigating to Downloads through the navbar you have the option to download the whole dataset as a whole or divided into the four domains by clicking the respective button. There are four options in terms of data formats namely csv, json, parquet and bed.