BBCC2017

International Conference on Bioinformatics and Computational Biology

12^th edition

Abstracts collection

INVITED SPEAKERS

Discovery of potent and selective bromodomain inhibitors by high-throughput fragment docking

Amedeo Caflisch

Computational Structural Biology, University of Zurich, Switzerland

Our docking program makes use of the CHARMM force field and generalized Born approximation. In the past three years, we have identified inhibitors for six human bromodomains, protein modules that bind acetylated histone tails. In the case of the CREBBP bromodomain, optimization of the initial hits by chemical synthesis of derivatives has resulted in several low-nanomolar binders with favorable ligand efficiency and selectivity against other bromodomains. Thus, the screening of fragment libraries by docking is very efficient (24,000 molecules in a day on a commodity desktop) and the hit rate, i.e., number of actives among the purchased compounds, is 20% to 50%. Importantly, we have validated the predicted binding modes by solving the crystal structure of nearly 100 bromodomain/ligand complexes.

Solving Bioinformatics Problems by Integer Linear Programming: An Ongoing Successful Story

Giuseppe Lancia

Algorithms, Combinatorics & Optimization, University of Udine, Italy

Integer Linear Programming (ILP) is a very powerful technique for the solution of hard optimization problems. In the past 20 years ILP has been successfully applied to a wide range of computational biology problems, showing once more the effectiveness of the approach. In this talk we will survey some of these applications, which touch many aspects of modern computational biology such as protein sequence and structure alignments, genome rearrangements and evolutionary distances, SNPs haplotyping and genotyping.

Multi-omics in the context of global biodiversity monitoring:delivering insight in a multi-stakeholder datascape

Pier Luigi Buttigieg

HGF-MPG Group for Deep Sea Ecology and Technology, Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Germany

Multi-omic approaches now pervade the life sciences and persist as a focal point of scientific innovation and rapid technological progress. The constant turnover of methods in the field, the wet lab, and in the in silico domain is a source of great excitement; however, it also presents a formidable obstacle in securing a well-defined role for omics in global biodiversity monitoring. As a result, long-term, omically enabled ecological observatories lack strategic coherence and stable approaches to feed their insights into widely used indicator frameworks for environmental monitoring. Rather than surveying the methods themselves, this talk will explore how omics data can join the suite of biodiversity indicators and information streams which contribute to assessments of ecosystem state. Activities within the Alfred Wegener Institute’s Frontiers in Arctic Marine Monitoring (FRAM) programme and the AtlantOS project will serve to demonstrate the role of coordination, bottom-up standards development, and the formation of practice-oriented consortia in this process. Routes to engage with the developers of Essential Variables and indicators for biodiversity, ocean health, and other priority targets for monitoring will also be put forth for discussion as well as interfaces to the UN’s Sustainable Development Goals.

Multi omic analysis of signalling factors in inflammatory comorbidities

Pietro Liò

Department of Computer Science, University of Cambridge, UK

Inflammation is a core element of many different, systemic and chronic diseases that have often an important autoimmune component. The systemic features could be investigated through the multi omic comparative analysis of many inflammatory diseases. We first perform exploratory analysis of the signalling molecules of several inflammatory diseases using gene expression data. The comparison of patients and healthy controls has evidentiated the importance of members of gene families coding for signalling factors which could explain the systemic aspects of inflammation. Then we have extended the analysis to methylation data and provided a novel methodology to compare omics data in gene families. With the application of the method on four related gene families in eight diseases with strong inflammatory components, we identified the disease associated genes whose expression and methylation levels in the patients significantly differs from the estimated evolutionary models in the control samples. We validated the results by comparing model predictions with biological rationale, derived from medical literature. The results provide the estimation of the omic mutual relationship (gene expression, promoter and gene body methylation) and their contribution to the inflammatory diseases.

SELECTED ORAL PRESENTATIONS

SESSION: MOLECULAR SIMULATIONS

A contact and energy-based approach for the classificationof biological and crystallographic interfaces

Anna Vangone, Katarina Elez, Alexandre MJJ Bonvin

Computational Structural Biology group, Bijvoet Center for Biomolecular Research, Faculty of Science – Chemistry, Utrecht University, Utrecht, The Netherlands

Presenting Author: Anna Vangone. Email: a.vangone@gmail.com

Keywords: Crystal contacts, protein interface, biological interface, protein-protein complexes

Study of macromolecular assemblies is fundamental to understand functions in cells. X-ray crystallography remains the most common technique to solve their 3D structure. In a crystal, however, both “biologically relevant” interfaces and “non-specific” ones, corresponding to crystal lattice contacts are observed. Due to the complexity of the complexes currently tackled, interface classification (i.e. identifying the biological interface from crystal lattice contacts) is not trivial and often prone to errors. Current state-of-art approaches focus on the use of energetics and entropic (PISA, Krissinel et al, 2007) or geometrical/evolutionary criterions (EPPIC, Duarte et al, 2012). Recently, we have demonstrated a simple and reliable contact-based approach for binding affinity prediction in protein-protein and protein-ligands complexes (Vangone and Bonvin, 2015, Kurkoglu et al, 2017). Here, we introduce a new interface classification approach based on a combination of both structural and energetic properties. Using machine learning on the DC and Many interface datasets (Baskaran et al. 2014), we show that a combination of interfacial contacts (classified based on their polar character) together with electrostatic, Van der Waals and desolvation energies calculated with HADDOCK (Dominiguez et al., 2003), lead to a classification accuracy for biological vs crystal contacts of 0.82, compared to 0.79 and 0.81 for PISA and EPPIC, respectively.

The interplay between structural stability and plasticity determines mutation profiles and chaperone dependence in protein kinases.

Antonella Paladino (1), Filippo Marchetti (1), Luca Ponzoni (2), Giorgio Colombo (1).

1) Istituto di Chimica del Riconoscimento Molecolare, CNR, Milano; 2) Molecular and Statistical Biophysics, International School for Advanced Studies SISSA, Trieste

Presenting Author: Antonella Paladino. Email: paladino.anto@gmail.com

Keywords: Molecular simulation, protein kinases, energy decomposition, structural alignment
A novel comparative analysis of representative kinases is shown to unveil the main dynamic and energetic determinants of functional regulation among different families. The relationships between stability and plasticity are also used to rationalize kinase tendencies to interact with the molecular chaperone Hsp90. These questions are tackled through newly developed MD-based methods of analysis of internal energy and dynamics applied to a total of 37 different systems, which represent wild type and mutated proteins, including active and inactive states. In this context, energetic decomposition analysis is coupled to multiple structural alignments and dynamic decomposition methods and identifies, across different families, common elements that underlie fold stabilization and conformational regulation. This analysis also exposes which substructures play a key role in determining chaperone dependence. Overall, the results highlight common interaction networks that underpin kinase stabilization, are perturbed by mutations, even if located at a distance, and underlie their tendencies to act as clients or non-clients of Hsp90. A specific focus is dedicated to the Tyrosine Kinase (TK) family, the widest and most targeted group of molecules in the human kinome.

A different look at molecular biology: the perceptive scale

Monica Zoppè

Scientific Visualization Unit, IFC – CNR

Presenting Author: Monica Zoppè. Email: mzoppe@ifc.cnr.it

Keywords: Protein structure, cell biology, size perception.

We all know that celles are small, and that proteins and other biuological molecules are even smaller. However, the size relationship of objects at the cellular leve, often escapes the attention of most biologists. With the perceptive scale of Ten million Times, we introduce a method to easily relate minute objects to our sensorial experience.

The presentation will provide some examples of how some aspects of biological processes can be reassessed if seen in the light of proper dimensions, and how unconscious assumptions are leading to a distorted interpretation of biological phenomena.

Relevance of ribose-nucleobase stacking interactions in functional RNAs

Mohit Chawla (1), Luigi Cavallo (1), Romina Oliva (2)

1) King Abdullah University of Science and Technology (KAUST), Saudi Arabia; 2) University of Naples Parthenope, Italy.

Presenting author: Romina Oliva. Email: romina.oliva@uniparthenope.it

Keywords: RNA, tertiary structure, stacking, structural bioinformatics, energy calculations

The view of RNA structure has evolved in few decades from a simplistic ‘two-dimensional’ concept of base-paired helices interspersed with single-stranded unpaired regions, to a variety of complex 3D arrangements associated with many complex functions. In fact, while an increasing number of functional RNA molecules are identified by biochemical and genetic screens, it is clearly emerging that the space of RNA architectures is vast and largely uncharacterized to date and that the specific folding pattern and function of RNA molecules lie in various weak interactions, in addition to the strong base-base pairing and stacking (1).

Following an approach we have been using since over a decade now (2-6), which integrates structural bioinformatics and advanced quantum mechanics calculations, we characterized one of these relatively weak interactions, featuring the stacking of the O4′atom of a ribose on top of the heterocycle ring of a nucleobase (7).

We identified 2015 ribose–base stacking interactions in a high-resolution set of non-redundant RNA crystal structures and calculated in vacuo energies for a set of representatives by quantum mechanics methods. Such interactions are widespread in RNA molecules and are located in structural motifs other than regular stems. Over 50% of them involve an adenine, as we found ribose-adenine contacts to be recurring elements in A-minor motifs. Less than 50% of the interactions involve a ribose and a base of neighboring residues, while approximately 30% of them involve a ribose and a nucleobase at least four residues apart. Some of them establish inter-domain or inter-molecular contacts and often implicate functionally relevant nucleotides.

Finally, we found that lone pair–p stacking interactions also occur between ribose and aromatic amino acids in RNA–protein complexes.

References:

Grosjean H, Westhof E. (2016) Nucleic Acids Res., 44:8020.
Oliva R, Cavallo L, Tramontano A. (2006) Nucleic Acids Res. 34:865.
Oliva R, Tramontano A, Cavallo L. (2007) RNA, 13:1427.
Oliva R, Cavallo L. (2009) J Phys Chem. B, 113:15670.
Chawla M, Abdel-Azeim S, Oliva R, Cavallo L. (2014) Nucleic Acids Res., 42:714.
Chawla M, Oliva R, Bujnicki JM, Cavallo L. (2015) Nucleic Acids Res., 43:9573.
Chawla M, Chermak E, Zhang Q, Bujnicki JM, Oliva R, Cavallo L. (2017) Nucleic Acids Res., 45 in press. doi: 10.1093/nar/gkx757.

Tuning the molecular mechanism of Hsp70 via a new allosteric network

Silvia Rinaldi (1), Victoria Assimon (2), Zapporah Young (3), Giulia Morra (1), Jason Gestwicki (2), Giorgio Colombo (1)

1) Istituto di Chimica del Riconoscimento Molecolare, CNR Via Mario Bianco, 9 20131 Milano, Italy; 2) Department of Pharmaceutical Chemistry, University of California at San Francisco, CA 94158; 3) Department of Medicinal Chemistry, Biology, University of Michigan, Ann Arbor, MI 48109, USA.

Presenting Author: Silvia Rinaldi. Email: silvia.rinaldi24@gmail.com

Keywords: Molecular Dynamics, Allosteric regulation, Molecular chaperons, Modulation of functional motions

Understanding the impact of ligand-dependent conformational perturbations on functional protein motions provides opportunities to regulate their biological activity. We addressed this question by advancing the mechanism of heat shock protein 70 (Hsp70) regulation, identifying structural determinants that control the interconversion among its functional states. Hsp70 is a model for difficult drug-target cases: it is a multiple-components protein showing allosteric regulation, it intersects different biochemical activities and is implicated in several diseases (Cellular and molecular life sciences 62.6, 2005). Thus, modulating Hsp70 activity with small molecules is challenging, but highly promising. In this view, the anticancer compound MKT-077 showed to derive its activity by differentially interacting with the protein allosteric states (Journal of Molecular biology 411.3,2011). Based on this observation, we combined the analysis of Hsp70 internal dynamics differentially modulated by MKT-077 with mutagenesis experiments to pinpoint the substructures relevant in tuning the protein functional motions. Hence, we identified mutations that mimic MKT-077 impact, showing that both the ligand and the mutations trap Hsp70 in a conformation where it cannot perform its physiological activities. By combining computational and biochemical studies we provided new insights into possible ways to tune Hsp70 functions, suggesting that this approach can be extended to other difficult case study.

Function enhancements of sweet proteins through molecular design

Serena Leone (1), Piero Andrea Temussi (1,2), Delia Picone (1)

1) Department of Chemical Sciences, University of Naples Federico II, I-80126, Napoli, Italy; 2) Department of Basic and Clinical Neurosciences, King’s College London, London SE5 9RX, UK

Presenting Author: Serena Leone. Email: serena.leone@unina.it

Keywords: sweet protein, pH-dependent stability, protein structure

Sweet proteins are a family of proteins with no structural homology, able to elicit a sweet sensation by interacting with the dimeric T1R2-T1R3 taste receptor. These molecules have great potential for food industry, where they could constitute a new class of sweeteners, provided that their intrinsic lability is overcome. We have focused on MNEI, a single chain derivative of the natural protein monellin, one of the sweetest molecules known to date. MNEI has a strongly pH-dependent behavior, with a propensity to aggregate and precipitate at neutral to alkaline pH. We have used Molecular Dynamics to rationalize and counteract this tendency, designing a new pH-insensitive mutant construct.. In parallel, based on the analysis of the surface potential maps, we have designed new mutants with increased ability to bind the receptor, in accordance with the so-called “wedge model”. From the merging of these results, we have produced a new construct, the sweetest protein so-far known, with combines a sweetness threshold of about 25 nM with elevated thermal stability and pH independent behavior. This molecule constitutes a valuable candidate for industrial applications and has been used in docking studies that have hinted to a previously unpredicted role of plasticity in the interaction between proteins and the sweet receptor.

Efficient hashing of spaced-seeds with block indexing

Samuele Girotto, Matteo Comin, Cinzia Pizzi

Department of Information Engineering, University of Padova, Italy

Presenting Author: Cinzia Pizzi. Email: cinzia.pizzi@dei.unipd.it

Keywords: spaced seeds, efficient hashing, k-mers, block indexing

Spaced-seeds, i.e. patterns in which some fixed positions are allowed to be wild-cards, play a crucial role in several bioinformatics applications involving substrings counting and indexing. A popular example is sequence alignment, where the use of spaced-seeds, like in PatternHunter [1], provides better sensitivity with respect to K-mers based approaches. K-mers based approaches are usually fast, as they can rely on efficient hashing and indexing. Spaced-seeds hashing is not as straightforward, and it can slow down the whole computation [2]. Recently, an approach to speed-up the hashing of DNA sequences with respect to spaced-seeds was proposed in [3], where the structure of the given spaced seed, and the hashing of the previous position is exploited, leading to a speed-up of about 1.6 w.r.t. standard hashing. In this work we propose a novel algorithm based on the indexing of small blocks that can be combined to obtain the hashing of spaced-seeds of any length. Preliminary experiments show that this approach can reach a speed-up of 1.9 up to 2.1, depending on the spaced-seed, w.r.t. standard hashing.

References:

1 – B. Ma, J. Tromp, and M. Li. Patternhunter: faster and more sensitive homology search. Bioinformatics,18(3):440,2002
2 – S. Girotto, M.Comin, C.Pizzi. Binning metagenomic reads with probabilistic sequence signatures based on spaced seeds. TCS, in press 2017

3 – S.Girotto, M.Comin, C.Pizzi. Fast Spaced Seed Hashing. WABI 2017, LIPIcs 88, 7:1-7:14, 2017

New approach to Molecular Dynamics using Monte Carlo Methods and Quaternions

Claudia Caudai (1), Monica Zoppè (2), Emanuele Salerno (1), Maria Antonietta Pascali (1), Anna Tonazzini (1)

1) CNR- ISTI, Pisa Italy; 2) CNR- IFC, Pisa Italy

Presenting Author: Claudia Caudai / Monica Zoppè. Email: mzoppe@ifc.cnr.it

Keywords: Molecular Dynamis, Monte Carlo Methods, Quaternions

We present a new approach to protein molecular dynamics (MD) based on Monte Carlo Methods applied to atomic 3D models managed by quaternions.

Classical MD simulations provide detailed information on the conformational changes of proteins and nucleic acids. Positions and movements are calculated as very expensive solutions to complicated differential equations. Finding a faster alternative is challenging. Atomic movements in proteins are constrained by the presence of force-fields related to thermal motion, intra- and inter-molecular interactions. Our approach to MD combines quaternions, to manage the movements of atoms on their own trajectories, and Monte Carlo Methods, to perform incremental rotations and control energy values. We control the angular trajectories of atoms by using unitary quaternions (Hanson et al., 2012, J. Mol. Graph. Model). Modeling molecules with quaternions allows a very handy application of Monte Carlo methods, as rotations become very easy to perform (Karney, 2007, J. Mol. Graph. Model). The random incremental rotations can be made specific for each individual amino acid, following its specific propensity to motion. In our case, we use different ranges of random incremental rotations for every dihedral angle in the protein chain. We carried out preliminary experiments on two small proteins, Calmodulin (pdb 1cfc) and BPT (Shaw et al., 2010, Science), starting form conformations directly retrieved from the PDB (Protein Data Bank).

The atomic-level structure of novel peptide-based nanomaterials unravelled by Molecular Dynamics

Nicole Balasco (1), Carlo Diaferia (2), Giancarlo Morelli (2), Antonella Accardo (2),

Luigi Vitagliano (1)

1) Institute of Biostructures and Bioimaging (IBB), CNR, Naples (Italy); 2) Department of Pharmacy, Research Centre on Bioactive Peptides (CIRPeB), University of Naples “Federico II” and DFM Scarl, Naples (Italy)

Presenting Author: Nicole Balasco. Email: nicole.balasco@unicampania.it

Keywords: self-assembled peptides, fibers, photoluminescence, molecular dynamics

Self-assembled peptides represent an attractive tool for the fabrication of novel nanomaterials as their physicochemical properties make them suitable for several applications in biology, nanomedicine, nanofabrication. A deep knowledge of the structural organization of these materials is essential for understanding their physicochemical properties and for engineering new materials with enhanced properties. In very recent years, we have studied novel self-assembled PEGylated peptides (Phe6, Nal2, (Phe-Tyr)3, Trp4) which are endowed with interesting photoluminescent properties (Diaferia et al, Chemistry 2016, Sci Rep. 2017, Chemistry 2017a and 2017b). Using modeling and molecular dynamics (MD) techniques we were able to provide the first atomic-level model for the peptide moiety of these assemblies. In particular, using approaches and models we previously adopted for amyloid-like systems (Esposito et al, PNAS 2006, Biophys J. 2008) we here show that the steric zipper association is highly compatible with the structure of the peptide-based spine of these materials. Our models well agree with the experimental characterization. We also performed REMD simulations on aggregates of rather limited dimensions. The analysis of the relative stability of these systems provided important information on structural features of the assemblies formed in the early stages of fiber formation and will have significant implications for the design and development of new peptide-based nanomaterials.

Probing the Interactions of Marine Thio-histidines with an Attractive Pharmaceutical Target for Cancer Therapy

Immacolata Castellano (1), Alfonsina Milito (1), Maria Russo (2), Gian Luigi Russo (2), Michael Lisurek (3)

1) Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Naples, Italy; 2) Istituto di Scienze dell’Alimentazione, CNR, Avellino, Italy; 3) Leibniz-Institute fur Molekulare Pharmakologie, Berlin, Germany

Presenting Author: Immacolata Castellano. Email:immacolata.castellano@szn.it

Keywords: 5-thiohistidines, ovothiol, molecular docking, enzyme target, marine drug

Marine thiohistidines are sulphur-containing compounds, which play a key role in maintaining cellular redox homeostasis and in the survival of organisms constantly exposed to environmental constraints. Among them, methyl-5-thiohistidines (ovothiols) are isolated in large amounts in sea urchin eggs¹, and display unique antioxidant properties, thanks to the peculiar position of the thiol group in the imidazole ring of histidine. We previously showed that in marine invertebrates the biosynthesis of such molecules increases before fertilization and larval settlement, and is regulated by environmental stressors, which eggs and larvae encounter in sea water². Recently we have undertaken pioneering biological and biochemical studies to identify potential pharmaceutical applications of these molecules. By pharmacological approaches, we have found that in human cancer cells, ovothiol induces cell proliferation arrest through an autophagic mechanism³.

We have now discovered a direct molecular target of the molecule, a cell-surface enzyme involved in metabolic and detoxification processes. The expression of this enzyme is significantly high in several tumors, and represents an attractive pharmaceutical target against cancer. In this study we have compared the effect of different thiohistidines on the enzymatic activity and the autophagic induction. Finally, we have performed in silico docking studies by Schrödinger’s computational technology and molecular modeling through SYBYL-X software to get hypothesis on the binding interactions between this class of compounds and the enzymatic target, and to give insight into the mechanism of inhibition.

Overall, our findings shed light on the key potentiality of 5-thiohistidines as marine drugs for cancer therapy.

References:

Palumbo et al. Tetrahedron Lett (1982) 23, 3207-3208.
Castellano et al. Scientific Reports (2016) 6, 21506.
Russo et al. Marine Drugs (2014) 12, 4069-4085.

SESSION: APPLICATIONS IN GENOMICS

Enabling practical pan-genomics with the variation graph toolkit

Erik Garrison (1), Jouni Sirén (1), Adam M. Novak (2), Glenn Hickey (2), Jordan M. Eizenga (2), Eric T. Dawson (1), Will Jones (1), Michael F. Lin (3), Benedict Paten (2), Richard Durbin (1)

1) Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom; 2) UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA; 3) DNAnexus, 1975 W El Camino Real, Suite 101 Mountain View, CA, USA

Presenting Author: Erik Garrison. Email: erik.garrison@gmail.com

Keywords: resequencing, genome graphs, variation graphs, read alignment, pan-genomics

Reference genomes provide a prior to guide our interpretation of DNA sequence data. However, conventional linear references are fundamentally limited in that they represent only one version of each locus, whereas the population may contain multiple variants. If the reference represents an individual’s genome poorly, it can impact read mapping and introduce bias. To resolve this issue, we explore the use of variation graphs as reference structures. These bidirected DNA sequence graphs compactly represent collections of genomes and variation among them. They may be built from linear reference genomes and variants, whole genome assemblies, or read-based assembly graphs. We have developed VG, a toolkit of computational methods for creating, manipulating, and utilizing these structures as reference systems. VG provides an efficient approach to mapping reads onto variation graphs using generalized compressed suffix arrays, and implements standard downstream applications such as variant calling and genotyping. Through a series of experiments on human and yeast pan-genomes we demonstrate that VG provides marked improvement in alignment accuracy over standard alignment methods in the context of variation while performing equivalently on a linear reference. VG provides this improvement at a modest increase in computational cost relative to methods that are based around a single linear reference genome, thus enabling the use of variation graphs even at the scale of vertebrate genomes.

Dictionary based method for pangenomic discovery among distal genomes.

Vincenzo Bonnici (1), Vincenzo Manca (1), Rosalba Giugno (1)

Department of Computer Science, University of Verona

Presenting Author: Vincenzo Bonnici. Email: vincenzo.bonnici@univr.it

Keywords: genomic dictionary, pangenome, gene clustering

Horizontal gene transfer is a widely diffused behaviour to exchange genetic material among microbes. It happens between closely related genomes, as well as distal ones. This aspect has acquired a high interest in pangenomic analysis that aim at studying genetic sharing among biological groups.

Current techniques to discover gene families in pangenomic contexts perform well among similar organism, but they lack in scaling up to wider ranges. The amount of alterations on gene sequences increases with phylogenetic distance of the genomes, thus thresholds and parameters of sequence similarity search must variegate.

We propose a methodology for pan-genome studies able to analyse closely related species and distal ones in a comprehensive manner. The methodology does not require any a priori knowledge about the phylogenetic relationship of involved individuals. It is based on a two-step pipeline aimed at providing a fully automatized searching of gene families. The first step discovers genomes sharing a relatively high amount of genetic material and groups genomes by their genetic content. Then, the second step performs gene clustering within grouped genomes.

The proposed approach is shown to outperform existing methodologies in presence of variegate alteration amounts between homologous genes. A good example is given by the Micoplasma genus containing highly heterogeneous species, for which classical methods are not able to find common genes at all.

Natural selection at the lipid transporter ABCA12 gene

Roberto Sirica (1)*, Marianna Buonaiuto (1)*, Valeria Petrella (2)$, Lucia Sticco (3)$, Donatella Tramontano (2), Dario Antonini (2), Caterina Missero (2), Ombretta Guardiola (1), Yali Xue (4), Qasim Ayub (4,5), Chris Tyler-Smith (4), Marco Salvemini (2), Giovanni D’Angelo (3), Vincenza Colonna (1)

1) Institute of Genetics and Biophysics, National Research Council, Naples, Italy; 2) Department of Biology, University of Naples, Federico II, Napoli, Italy; 3) Institute of Protein Biochemistry, National Research Council, Naples, Italy; 4) Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; 5) Monash University Malaysia, Selangor Darul Ehsan, Malaysia

*: equal contribution; $: equal contribution

Presenting Author: Vincenza Colonna. Email: vincenza.colonna@igb.cnr.it

Keywords: natural selection, ABCA12, ancient DNA, UV radiation

Natural selection increases frequency of variants responsible for functions that are favorable in a certain environment. A genome-wide scan for positive selection in contemporary humans identified a signal of positive selection in European and Asians at rs10180970 C/T, located in the second intron of ABCA12, however the functional consequences of genetic variation at this locus are unknown.
We investigated the genomic region surrounding rs10180970 and confirmed positive selection. We have extended the set of individuals used to investigate genetic variation and natural selection at rs10180970, including also DNA sequences form ancient samples. We investigate linkage on rs10180970 with genetic variants in its proximity and the function of the two alleles of rs10180970 through functional assays, in particular in relation to UV radiation.

Our work has three main findings. First, we reconstructed the demographic history of the T allele to discover that it is specific to Sapiens and it was very frequent in humans migrating out of Africa. Second, while rs10180970 might act as a sentinel for the signal of selection, the signal extends over 35 kb including first intron, first two exons and the transcription starting site. Related to this, we observe a trend of differential gene expression associated with genotypes at rs10180970. If confirmed significant, than influence on ABCA12 expression could be one of the possible consequences of the selection event.

Parallel genome annotation versions: the need for data reconciliation in genomics

Luca Ambrosino (1), Chiara Colantuono (1), Francesco Monticolo (2), Maria Luisa Chiusano (1, 3)

1) Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Napoli, Italy; 2) CREA; 3) Department of Agriculture, University of Naples “Federico II,” Portici (Napoli), Italy

Presenting Author: Maria Luisa Chiusano. Email: chiusano@unina.it

Keywords: genome annotation version, sequence similarity, data reconciliation

The advent of next generation sequencing technologies is driving the flourishing of genome sequencing efforts and the production of massive related collections. The spreading of novel and low cost technologies, however, is not always followed by proper coordination and integration of the available data. The fast release or update of genome assemblies, followed by parallel annotation versions, often led by community based efforts, is dangerously affecting the establishment of reliable reference resources and their proper exploitation by the whole scientific community.

With the aim to cross link data from independent consortia to those from reference databases, we compared parallel genome annotations of six economically important plant species, investigating on their differences and performing sequence similarity analyses of their gene products against reference repositories of protein sequences and functions. Our results highlight the impact of the discrepancies in associated biological studies. In addition, to avoid that fundamental biological information could be appropriately exploited exclusively by expert users, we produced and released six unified genome annotations, resulting from the reconciliation of the different annotation versions currently available for the same genome assemblies.

Our approach can be useful in all cases in which a reference unique genome annotation is still far to be established.

GBS-derived SNP catalogue unveiled genetic diversity of Italian olive cultivars.

Nunzio D’Agostino (1), Francesca Taranto (2), Salvatore Camposeo (3), Giacomo Mangini (4), Valentina Fanelli (2), Susanna Gadaleta (2), Monica Marilena Miazzi (4), Stefano Pavan (4), Valentina di Rienzo (2), Wilma Sabetta (2), Samanta Zelasco (5), Enzo Perri (5), Cinzia Montemurro (2,4).

1) CREA Research Centre for Vegetable and Ornamental Crops, Pontecagnano Faiano, Italy; 2) SINAGRI S.r.l. – Spin Off of the University of Bari “Aldo Moro”, Bari, Italy; 3) Department of Agricultural and Environmental sciences, University of Bari “Aldo Moro”, Bari, Italy; 4) Department of Soil, Plant and Food Sciences, University of Bari “Aldo Moro”, Bari, Italy; 5) CREA Research Centre for Olive, Citrus and Tree Fruit, Rende, Italy

Presenting Author: Nunzio D’Agostino. Email: nunzio.dagostino@crea.gov.it

Keywords: Olea europeae, olive, genetic diversity, genotyping-by-sequencing, SNPs

In the light of new environmental conditions imposed by climate changes and of new or resurgent pests and diseases, investigations on the widest possible nucleotide variability across germplasm collections, knowledge on population genetic structure and information on genotype–phenotype associations are essential to undertake efficient olive breeding programs. Within this scenario, we performed genotype-by-sequencing on a panel of 94 cultivars representative of olive Italian germplasm. A reference-based and a reference-independent SNP calling pipeline generated 22088 and 8088 high-quality SNPs, respectively. Both datasets were used to model population structure via parametric and non parametric-based clustering. Even if the two pipelines generated a 3-fold difference in the number of SNPs, both unveiled wide genetic variability and allowed cultivars to be clustered based on fruit morphology- and size-related traits and on geographical area of cultivation. These findings allowed us to identify three main gene pools within the population under investigation: I1 includes most of the cultivars of Magno-Greek origin; I2 represents most of the Italiote cultivars with admixed ancestry and I3 consists of cultivars of Arab-Catalan origin. In addition, based on allele frequency estimates, we distinguished known and novel cases of synonymies. Finally, preliminary GWAS allowed candidate SNP(s)-trait associations for three bio-agronomical traits to be identified.

SESSION: APPLICATIONS IN MEDICINE

Patient Stratification in Cancer using Survival-based Bayesian Clustering

Ashar Ahmad (1), Holger Fröhlich (1,2)

1) Bonn Aachen International Center for Information Technology, University of Bonn, 53127 Bonn, Germany; 2) UCB Biosciences GmbH, 40789 Monheim, Germany

Presenting Author: Ashar Ahmad. Email: ashar@bit.uni-bonn.de

Keywords: Personalized Medicine, Bayesian Hierarchical Modelling, Cancer Subtype identification, Omics data integration

Discovery of clinically relevant disease sub-types is of huge importance in personalized medicine. Previously this has been explored in an unsupervised machine learning paradigm. This meant clustering of patients based on available -omics data plus a follow-up analysis determining the clinical relevance of sub-types by comparing their disease progressions. This methodology, however, fails to guarantee the separability of the sub-types based on their subtype-specific survival curves. We propose a new algorithm, Survival-based Bayesian Clustering (SBC) which simultaneously clusters heterogeneous –omics and clinical end point data (time to event) in order to discover clinically relevant disease sub-types. For this purpose we formulate a novel Hierarchical Bayesian Graphical Model. SBC makes sure that patients are grouped in the same cluster only when they show similar characteristics with respect to molecular features across omics data types as well as similar survival curves. We extensively test our model in simulation studies and apply it to cancer patient data from the Breast Cancer dataset and TCGA repository. Notably, our method is not only able to find clinically relevant sub-groups, but is also able to predict cluster membership and survival on test data in a better way than other competing methods.

Our original article:

https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btx464/4036384/Towards-Clinically-More-Relevant-Dissection-of

From phenotypes to molecular mechanisms and pathways

Giulia Babbi, Giuseppe Profiti, Pier Luigi Martelli, Rita Casadio

Bologna Biocomputing Group, University of Bologna

Presenting Author: Giulia Babbi. Email: giulia.babbi3@unibo.it

Keywords: Phenotypes, Biological Pathways, Diseases

Advanced sequencing technologies allow studying and unravelling genetic components of phenotypic traits. The goal is highlighting molecular pathogenic mechanisms at the bases of disease development in different human phenotypes. Here, we analyze the gene-phenotype relations focusing on biological pathways. Taking advantage of human disease classification of the Human Phenotype Ontology [1] and of the OMIM Clinical Synopsys [2], we associate to each cluster of diseases a set of related genes. Mapping is provided by eDGAR [3], our resource of gene-disease associations with annotated features, and our NET-GE [4] algorithm enriches each phenotype for GO terms, KEGG and REACTOME pathways. By this, we built a platform for comparing different phenotypes having as a result the shared pathways, biological process and molecular mechanisms at the basis of different phenotypes and symptoms. With our resource, researchers and physicians, starting from human phenotypes can easily associate the phenotype appearance (symptoms) to genes involved and to the characterizing biological pathways/processes for an immediate understanding of molecular mechanisms and possible cures.

References:

[1] Sebastian K et al. (2017) Nucleic Acids Res; 45(D1):D865–76.
[2] Amberger JS et al. (2015) Nucleic Acids Res; 43:D789-98.
[3] Babbi G et al. (2017) BMC Genomics; Suppl 5:554.
[4] Bovo S et al. (2016) Bioinformatics; 32(22):3489-3491.
SESSION: METAGENOMICS

AMPS: A pipeline for screening archaeological remains for pathogen DNA

Ron Hübler, Felix M Key, Christina Warinner, Kirsten Bos, Johannes Krause, Alexander Herbig

Max Planck Institute for Science of Human History, Jena

Presenting Author: Ron Hübler. Email: huebler@shh.mpg.de

Keywords: aDNA, Metagenomics, Pathogens, Screening

Second generation DNA sequencing enables large-scale metagenomic studies on archaeological remains providing insights into host-bacterial relationships throughout human history. Here we present AMPS (Ancient Metagenomic Pathogen Screening), an automated bacterial pathogen screening pipeline for ancient DNA sequence data that provides straightforward and reproducible information on species identification and authentication of their ancient origin by evaluating common criteria of aDNA authenticity.

AMPS consists of (1) a customized version of MALT, (2) MaltExtract, a Java tool that evaluates aDNA authenticity criteria for a list of target species, and (3) customizable post-processing scripts to identify candidate hits. We evaluated AMPS with DNA sequences obtained from archaeological samples and simulated aDNA data from 33 bacterial pathogens of interest spiked into diverse metagenomic backgrounds. AMPS successfully identified all target pathogens. In addition, we used these data to assess and compensate for biases resulting from the reference database contents and structure. Finally, we compared the performance of AMPS to two other methodologies for microbiome characterization: a marker gene based approach with MIDAS and k-mer matching with Kraken. Overall, AMPS provides a versatile and fast pipeline for high-throughput pathogen screening of archaeological material that aids in the identification of candidate samples for further analysis.

SEDE-GPS: Socio-Economic Data Enrichment based on GPS information

Theodor Sperlea (1), Stefan Füser (1), Jens Boenigk (2), and Dominik Heider (1)

1) Department of Mathematics and Computer Science, Hans-Meerwein-Str. 6, D-35032 Marburg, Germany; 2) Biodiversity Department, Center for Water and Environmental Research, University of Duisburg-Essen, D-45141 Essen, Germany

Presenting Author: Theodor Sperlea. Email: theodor.sperlea@staff.uni-marburg.de

Keywords: microbial ecology, machine learning, biodiversity

Changes of biodiversity on a microbial level are regularly used as an indicator of the instability of an ecosystem (Hering et al., 2010; Karimi et al., 2017). The largest contribution to this instability is the impact of human action on the environment (Waters et al., 2016; Isbell et al., 2017). Therefore, it is highly important to monitor these effects and identify key factors among them. To this end, we developed the tool SEDE-GPS, which automatically gathers socio-economic and geographic data of an user-provided GPS coordinate. Its data sources are public databases such as Eurostat, the Climate Data Center, and OpenStreetMap, as well as non-public data sources such as Twitter and Google Maps. In the current study, we used SEDE-GPS for data enrichment of microbial rRNA-SSU next-generation sequencing datasets of 39 Austrian fresh-water lakes (Nolte et al., 2010; Grossmann et al., 2016b). We used different machine learning models in order to find associations between the socio-economic factors from SEDE-GPS with microbial alpha diversity. These analyses suggest that lake elevation is one of the major drivers of microbial diversity, which was already described by Grossmann et al. (2016b). The results from this study show that SEDE-GPS is a handy and easy-to-use tool for comprehensive data enrichment for studies in ecology (Grossmann et al., 2016a) and other fields that are affected by socio-economic factors.

SESSION: TOOLS

tscv package: a novel approach to carry out cross-variance statistical tests

on micro-array datasets

Rohmatul Fajriyah (1), Kumar Parijat Tripathi (2), Dedi Rosadi (3), Mario Rosario Guarracino (2)

1) Department of Statistics, Universitas Islam Indonesia, Yogyakarta, Indonesia; 2) Institute for High Performance Computing and Networking, National Research Council, Naples, Italy; 3) Department of Mathematics, Universitas Gadjah Mada, Yogyakarta, Indonesia

Presenting author: Mario Rosario Guarracino. Email: mario.guarracino@cnr.it

Keywords: two-samples t-test, cross-variance, homogeneity of variance, gene expression analysis

The two-sample t-tests is one of the most used statistical tests in the analysis of gene expression data. Its actual usefulness in deciding whether there is a significant difference between the abundance means of a gene in two sets of instances is often limited by the sample size of microarray experiments, which is usually very small. Here we present tscv (two samples cross-variance homogeneity), a new R package based on the concept of cross-variance [1].

It can be used as an alternative approach to test the significance of the difference between two means, when the sample size is small. We show how to compute a p-value based on the cumulative probability distribution, for the better interpretation of the results.

In its present form, the package requires homogeneity of variance between the two groups, although we will show how to overcome this limitation. To test the usefulness of this package, we firstly compare cross-variance method with standard t-test on a synthetic dataset. Then, we analyze the differential expression of genes with both an established pipeline and tscv package, and we report the significant results obtained from tscv package over the established pipelines, highlighting their biological relevance on a real word microarray dataset.

Reference:

Fajriyah, R., Introducing a cross-variance concept and its application, (2016), 12th IEEE International Conference on Mathematics, Statistics, and Their Applications (ICMSA). doi: 10.1109/ICMSA.2016.7954321

The biodb R package: a unified framework to access biological and chemical databases

Pierrick Roger, Alexis Delabrière, Étienne A. Thévenot

CEA, LIST, Laboratory for data analysis and systems’ intelligence,

MetaboHUB4, France

Keywords: database, R, biology, chemistry, mass spectrometry.

The multiplication of databases in the field of biology and chemistry, as well as the multiplicity of access technologies (REST, SOAP, or whole download), and data formats (text, CSV, XML, JSON or HTML), makes it difficult for scientists to directly access programmatically database content. We have therefore developed an R package that facilitates the development of new database connectors, new output parsers and parsing of new fields. It currently targets compound and mass spectrometry databases, which are widely used in proteomics and metabolomics. The library meets the database policy in terms of maximum number of

requests per second. 15 database connectors have been implemented into biodb: ChEBI, Chemspider, HMDB, KEGG, Massbank, … Entries can be searched by identifier, compounds by mass or by name, and mass spectra by peak or MS2 spectra. The package has been successfully used to aggregate data from different databases (BIOMARGIN project), to check compound data tables (MetaboHUB project) and to build a custom MS peak table to be used by an annotation tool (W4M/PhenoMeNal project). Features like

conversion to CSV file, computing of missing fields and cache system for minimizing database access have been developed. Future developments will include: full integration inside W4M/PhenoMeNal annotation tool, a helper script to check the correctness of a custom compound table and/or complete it, LC-MS matching features and package availability on CRAN or Bioconductor.

RankerGUI: a web application for comparing expression profiles using a

rank-based statistical approach

Kumar Parijat Tripathi (1), Amarinder Singh Thind (1), Seetharaman Parashuraman (2), Mario Rosario Guarracino (1)

1) Institute for High-performance Computing and Networking, CNR, Via Pietro Castellino 111, Naples, Italy; 2) Institute of Protein Biochemistry, CNR, Via Pietro Castellino 111, Naples, Italy

Presenting Author: Kumar Parijat Tripathi. Email: parijat.tripathi@icar.cnr.it

Keywords: expression profile, prototype rank list, rank-rank hyper-geometric overlap

Gene expression profiling provides an opportunity to explore the unique characteristics of biological states and phenotypes. These studies can determine the underlying transcriptional response and how diseases, drugs and gene perturbations affect complex gene interaction networks and associated pathways. To carry out a comparative analysis of gene expression profiles we develop a web-based application Ranker employing a rank based statistical approaches named Rank-Rank hyper-geometric overlaps (RRHO) and Prototype Rank list (PRL) data analysis methods [1]. RankerGUI is developed using in-house PHP and JavaScript codes, whereas the backend has been implemented in R using Bioconductor, Cytoscape JS packages [2]. After the processing of the job, output results are displayed using heat maps, Venn diagrams, distance and similarity matrices and are available for download. It also provides an option to re-run the job with different parameters values. To test the usefulness of our tool, we employ RankerGUI to analyze i) Cancer datasets; ii) to compare expression profiles from gene perturbation (knock-down or overexpression) experiments. Ranker is a user-friendly web-based application for comparing and characterizing gene expression signatures profiles under different experimental conditions, and available at http://www-labgtp.na.icar.cnr.it/RankerGui/.

References:

1) Lecture Notes in Computer Science, vol 9874. Springer, Cham

2) Omics: a journal of integrative biology 17.2 (2013): 116-118

SESSION: OMICS AND MULTI-OMICS

A multilevel comparative genomics approach to check for inter and intra species relationships and gene predictions quality

Luca Ambrosino (1), Chiara Colantuono (1), Marco Miralto (1), Mara Sangiovanni (1), Maria Luisa Chiusano (1,2)

1) Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Napoli, Italy; 2) Department of Agriculture, University of Naples “Federico II,” Portici (Napoli), Italy

Presenting Author: Luca Ambrosino. Email: luca.bioinfo@gmail.com

Keywords: comparative genomics, orthologs, paralogs

The new era in “omics” sequencing technologies paves the way towards the deep understanding of speciation events, diversification and function innovation. This can be achieved by investigating the molecular similarities between species by reliable comparative analyses, for instance those allowing the definition of ortholog and paralog genes. However, the fast spreading of genome sequencing results, often endowed with still preliminary and unstable annotations, requires suitable bioinformatics to properly deal with the amount of accumulating data. We here present a software package that implements a multilevel comparative analysis to identify genome sequence similarities. Based on this methodology we are able to identify putative ortholog between two species and paralog genes, organizing them into network, providing a tool able to investigate gene family expansion in each species, and leading to the identification of species-specific genes. Moreover, the methodology here proposed permits gene annotation quality checks identifying errors or biases in gene loci predictions.

This effort, beyond providing a reference framework for comparative analysis, also depicts a powerful bioinformatics methodology for comparative studies, overcomes the limits of exclusively protein sequence-based analyses and is useful especially for species with still preliminary gene annotations, which are nowadays dramatically spreading thanks to the fast evolving genome sequencing technologies.

Gene-mining and –omics approaches to study enzymes with biotechnological potential from marine microalgae

Chiara Lauritano, Adrianna Ianora

Integrative Marine Ecology Laboratory, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Napoli, Italy

Presenting Author: Chiara Lauritano. Email: chiara.lauritano@szn.it

Keywords: Transcriptomes, Microalgae, Marine Biotechnology

Microalgae are photosynthetic eukaryotes that constitute one of the major components of marine and freshwater phytoplankton. They consist of a very diverse group with a significant unexplored genetic potential. Recently, new insights have been gained into both the ecology and biotechnology of these important marine species thanks to various genome and transcriptome sequencing projects. The aim of this study was to sequence the full transcriptomes of four marine microalgae (i.e. the dinoflagellates Amphidinium carterae and Alexandrium tamutum, the diatom Cylindrotheca closterium and the green algae Tetraselmis suecica) in order to identify gene clusters/enzymes involved in the synthesis of toxins and bioactive metabolites and/or enzymes with biotechnological applications. Results have identified several interesting enzymes, e.g. polyketide synthases (genes that may be involved in the synthesis of toxins/bioactive compounds), L-asparaginase (used for the treatment of Leukemia and for acrylamide reduction in food industries) and cellulase (useful for biofuel production and other industrial applications). Differential expression analyses between control and stressful culturing conditions have shown how stress influences the activation of these enzymes and, consequently, of their products (e.g. bioactive compounds). These results provide new insights into possible biotechnological applications of microalgae.

OscoNet: Detecting oscillatory gene networks using an FDR calibrated non-parametric test

Alexis Boukouvalas (1), Luisa Cutillo (2), Elli Marinopoulou (1), Nancy Papalopulu (1) and Magnus Rattray (1)

1) University of Manchester, Manchester, UK; 2) University of Naples, Napoli, Italy

Presenting author: Alexis Boukouvalas. Email: alexis.boukouvalas@manchester.ac.uk

Keywords: single cell RNA sequencing, oscillatory gene expression, hypothesis test

Oscillatory genes have been shown to play a pivotal role in many biological contexts including development. Detecting oscillatory genes from snapshot single-cell experiments is a challenging task due to the lack of time information. The recently proposed Oscope approach involves identifying co-oscillatory groups of genes without requiring time course data. However an arbitrary threshold is required to identify the number of oscillatory genes, a misspecification of which can have a detrimental effect on the effectiveness of the method and any downstream analysis. We build on this approach by developing a non-parametric hypothesis test that is calibrated with respect to the false discovery rate. We develop a network analysis pipeline on the resulting graph of significantly co-oscillating gene pairs to identify communities of significantly co-oscillating genes. We show significantly improved accuracy of the hypothesis test approach compared to the original Oscope approach on simulated and single cell RNA sequencing data. We find our approach to be more sensitive than the original Oscope method as it discovers a larger number of known cell cycle oscillating genes. Lastly we use our method to identify oscillating genes in single cell RNA-seq data of neural stem cell harvested from the mouse brain subventricular zone. We experimentally validate the predictions of our method using a qPCR protocol to test for oscillatory gene expression in synchronised cell populations.

On using classification to compile cancer-specific panels of miRNA biomarkers.

Shib Sankar Bhowmick (1,2), Indrajit Saha (3), Debotosh Bhattacharjee (1), Filippo Geraci (4)

1) Department of Computer Science and Engineering, Jadavpur University, Kolkata, India; 2) Department of Electronics & Communication Engineering, Heritage Institute of Technology, Kolkata, India; 3) Department of Computer Science and Engineering, National Institute of Technical Teachers’ Training & Research, Kolkata, India; 4) Institute for Informatics and telematics, National Research Council, Pisa, Italy

Presenting Author: Filippo Geraci. Email: filippo.geraci@iit.cnr.it

Keywords: MicroRNA-based biomarkers, classification, differential expression analysis, population analysis

MiRNAs are small non-coding RNAs that influence gene expression by binding to the 3′ UTR of target mRNAs. Soon after discovery, miRNAs dysregulation has been associated to several pathologies. In particular, they have been reported to be good candidate biomarkers for tumor diagnosis and prediction of therapeutic responses.

With the advent of NGS, measuring the expression level of the whole miRNOme at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This motivated us to perform an in-silico study to distill cancer-specific panels of MicroRNAs that can serve as biomarkers.

We formulated the problem of finding biomarkers as a two-class classification task where, given a population of healthy and cancerous samples, we want to find the subset of miRNAs that leads to the best classification. We fulfill this task leveraging on a combination of data mining tools. In particular, we used SVM for classification, and PCA/KPCA for miRNAs selection.

We identified 10 cancer-specific panels whose classification accuracy is higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of a cancer, but can be used for diagnosis purposes as well. In addition, using survival analysis we showed that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrate that our panels are promising candidates for a subsequent in vitro validation.

Renal TRACERx: Deterministic evolutionary trajectories govern primary tumour growth

Kevin Litchfield (1)*, Samra Turajlic (1, 2)*, Hang Xu (1)*, Andrew Rowan (1)*, Tim Chambers (1)*, Stuart Horswell (1)* and Charles Swanton (1)

1) Translational Cancer Therapeutics Laboratory, The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, United Kingdom; 2) Renal and Skin Units, The Royal Marsden Hospital, London, SW3 6JJ, United Kingdom

Presenting Author: Dr Kevin Litchfield. Email: kevin.litchfield@crick.ac.uk

Keywords: tumour-evolution, genomics, mutational-cooccurrence, evolutionary-trajectory

While the molecular landscape of clear cell renal cell carcinoma (ccRCC) is well characterised, detailed knowledge of the evolutionary trajectories that govern ccRCC development is lacking. The TRAcking renal cell Cancer Evolution through Therapy (TRACERx) is a large scale multi-centre trial utilising multi-region sampling and molecular profiling to study ccRCC evolution. Here we report analysis on 1209 primary tumour regions from the first 100 renal TRACERx patients, and 169 metastatic regions from a subset of 38 patients with metastatic disease. All samples were subject to high throughput sequencing, utilising a combination of WGS, WES and targeted panel. Mutations were clustered according to their cellular prevalence, with each cluster representing a node on the phylogenetic tree of the tumour. A novel analytical pipeline was employed to study driver event co-occurrence, mutual exclusivity and timing (at a tumour clone level), revealing deterministic patterns of clonal evolution and disease progression. Subtype clustering identified seven distinct ccRCC evolutionary subtypes, ranging from homogeneous tumours that achieve high selective fitness within the most recent common ancestor clone, disseminating rapidly; to highly branched tumours with _10 driver subclones and extensive parallel evolution that present with oligometastatic disease. Our insights reconcile the variable clinical behaviour of ccRCC and offer potential insights to aid patient stratification approaches.

A new method for defining survival modules using multi omics data

aggregation and pathways

Paolo Martini (1), Gabriele Sales (1), Monica Chiogna (2), Enrica Calura (1), Chiara Romualdi (1)

1) Department of Biology, University of Padova; 2) Department of Statistics, University of Padova

Presenting Author: Paolo Martini. Email: paolo.martini@unipd.it

Keywords: Survival analysis, Omics integration

The molecular medicine has rapidly grown thanks to advances in genome-wide assays that can produce expression or methylation profiles for many patients. Each assay is used to drive diagnosis, prognosis or the treatment. We devised tools for the topological pathway analysis of expression data (clipper for 2-class comparison and timeClip for time series). These tools identified a molecular circuit around the lncRNA PVT1 as a prognostic marker for EOC stage I. The new challenge is the multi omics aggregation. A naive approach would rely on treating each omics dimension separately to finally merge the results. Instead, we devised a new method to aggregate multiple omics dimensions and predict patients’ survival by shifting from the gene-centric to the module view. Using pathway topology (Reactome from Graphite), we define the survival modules as group of genes closely connected in a pathway. In the survival modules, we integrate the expression, methylation and somatic mutations of the genes and we test the predictive power of each module. Watching the problem from three perspective (mRNA, SNP and methylation) enhances the correct patients’ stratification. Switching to survival modules increase the prediction power: any failure of a gene in the module produce a failure of the module. Furthermore, modules make easier the formulation of hypothesis of pathological mechanisms because the connection between the function of the modules and the phenotype is straightforward.

Integration of multi-omics data for cancer survival prediction

Antonella Iuliano (1), Claudia Angelini (1), Italia De Feis (1), Pietro Liò (2)

1) Istituto per le Applicazioni del Calcolo “Mauro Picone”, CNR, Italy; 2) Computer Laboratory, University of Cambridge, UK

Presenting Author: Antonella Iuliano. Email: a.iuliano@na.iac.cnr.it

Keywords: Cox regression, data integration, multi-omics, network methods, screening techniques, survival prediction

Nowadays, it is well known that the information of single-omics cannot fully unravel the complexities of cancer biology. Therefore, methods for the integration of multi-omics data are required to give a more comprehensive picture of cancer dynamics. In the last years, thanks to international projects and consortia, the access to the genome-wide data at multiple molecular levels  has been made available by a variety of high-throughput technologies. Although these large datasets provide many key insights into cancer progression, there are still strong needs for more efficient computational methods able to analyze such amount of data. Indeed, from a statistical perspective, the most important challenge in integrating multi-omics is the high-dimensionality. Taking more levels into account increases the number of biological variables including unknown parameters, which are difficult to estimate. As a consequence, in order to tackle such problems, we propose a statistical strategy based on network survival analysis. The aims of our study are: (i) to integrate different omics datasets; (ii) to use screening-network methods; (iii) to test the predictive power of our signatures on independent datasets; (iv) to derive network interactions and associated risk pathways; (v) to identify specific mutations using COSMIC database. In particular, we will discuss novel results to predict survival of patients on different breast cancer datasets.

Community structure validation in networks

Luisa Cutillo (1,2), Mirko Signorelli (3)

1) Universita’ degli studi di Napoli Parthenope; 2) University of Sheffield; (3) Leiden University Medical Center

Presenting Author: luisa cutillo. Email: luisa.cutillo78@gmail.com

Keywords: community structure validation, networks, network enrichment

The growing availability of real world networks inspired the study of complex networks in the multidisciplinary fields of social, technological and biological networks. What makes networks so attractive? We are constantly dealing with networks in real life. Networks constitute a mathematical modeling of a real problem: understanding networks structure is the key to unravel the secret message in data. Community structure is a commonly observed feature of real networks. The term refers to the presence in a network of groups of nodes that are high internal connected and poorly connected to each other. Whereas the issue of community detection has been widely addressed, the problem of validating a partition of nodes remains an open issue. We propose an inferential procedure for community structure validation of network partitions, which relies on network enrichment analysis [1]. We construct a set of community structure validation indices, relying on the hypothesis testing NEAT [1]. The proposed procedure allows to compare the validity of different partitions of nodes as community structures for a given network. It can be employed to assess whether two networks share the same community structure and to compare the performance of different clustering algorithms. We show the application of our overall strategy to the set of 30 tissue specific gene-networks inferred in [2].

References:

[1] Signorelli et al. (2016) BMC Bioinformatics, 17:352

[2] Gambardella et al. (2013) Bioinformatics, 1776-85.

SPECIAL SESSION: COMPUTATIONAL METHODS TO ANALYZE

BIOLOGICAL BIG DATA

Genomic big data management and rule-based knowledge extraction

from next generation sequencing data of cancer

Emanuel Weitschek

IASI-CNR, Rome, Italy

Data extraction and integration methods are becoming essential to effectively access huge amounts of genomics and clinical data. In this talk, we focus on the Genomic Data Commons, a comprehensive archive of tumor data from Next Generation Sequencing experiments of more than 40 cancer types. In particular, we take into consideration The Cancer Genome Atlas (TCGA) project and we propose TCGA2BED a software tool to download and convert public TCGA data in the structured BED format. Additionally, we extend TCGA with further data from several other genomic databases (i.e., NCBI Entrez, HGNC, UCSC) and provide an updated repository with all publicly available TCGA copy number, DNA-seq, RNA-seq, miRNA-seq, DNA-methylation experimental and meta data. The use of the BED format reduces the time needed in managing and analyzing TCGA data; it makes possible to efficiently deal with huge amounts of cancer data, and search, query, and extend them. It facilitates the investigators in performing knowledge discovery analyses aimed at aiding cancer treatments. Indeed, we propose to analyze the TCGA data with a supervised approach using CAMUR, a tool able to elicit a high amount of knowledge by computing many rule-based classification models, and therefore to identify most of the clinical and genomic variables related to the predicted cancer class. We apply CAMUR on all public available RNA-sequencing data sets from TCGA and we validate its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. The knowledge base and the software tool are released online and interested researchers have free access to them. Finally, we provide an overview of other analysis methods and software tools developed by our group able to successfully extract new knowledge from genomic big data.

SWIM: a computational tool to unveiling crucial nodes in complex biological networks

Paola Paci

IASI-CNR, Rome, Italy

Over the last two decades, biological sciences have undergone radical transformation through the development of new research technologies that have produced a real explosion in the amount of data available. Just think of the modern genomic sequencing techniques that have made the sequencing of the human genome, different animal and plant organisms, and many microorganisms simpler, less expensive and more reliable with enormous benefits for diagnosis and treatment of diseases. In the wake of the genome, many other objects that represent various biological entities analyzed in their entirety are defined and studied: from transcriptome (the complete set of RNA expressed by the cell) to the proteome (the complete set of proteins) to more exotic objects such as interactome and metabolome. This huge amount of data available is an immense resource for research, but only the amount is not enough. If in the past there was a difficulty in collecting genetic data, today the challenge is to give them meaning and it is therefore essential to use effective informatics solutions capable of managing, analyzing and integrating these biological “Big Data”.

An excellent solution for integrating data to support research was designed by Dr. Paci, who developed SWIM (SWitchMiner), a freely downloadable open-source software with GNU GPL license available at http://www.iasi.cnr.it/new/software.php. The software comes with a wizard-like GUI that greatly simplifies the execution of the otherwise complicated procedure and allows the user to interact with the software by executing certain operations through a series of subsequent steps. The software, capable of detecting genes responsible for major changes in the phenotype of a cell, has so far been successfully applied in two very different fields: the vine-wine and the oncology.

On the one hand, viticulture is undoubtedly a field of great economic and strategic importance and of great cultural value for Italy. Speaking of numbers, grapevine moves a turnover of over 100 billion euros a year. The potential impacts justify the choice of applying SWIM to the genome of grapevine, a project that has led to the identification of key genes in the ripening process of grapes. Thanks to SWIM it is now possible to decipher plant responses to particular conditions or stages of development and to control the quality of wine in response to climate change. The results of this study were published in the prestigious scientific journal The Plant Cell (The Plant Cell 2014, 26, pp. 4617-4635) and by many national newspapers (La Stampa, Il Gazzettino, Gambero Rosso, Corriere del Veneto, Vinoso, Bere il Vino, Agrinews, VQ-Vite, Vino&Qualità, Trebicchieri). For this publication, Dr. Paci received the SysBio 2014 Award as the best publication of the year by the SYSBIO Center for Systems Biology (http://www.sysbio.it).

On the other hand, oncology is undoubtedly a field of high healthcare, social and economic impact. Cancer is still the second cause of death in Italy (30% of all deaths) after cardiovascular disease, with a growing number of tumor sufferers. It is estimated that in Italy there are 365,000 new cancer diagnosis per year (excluding carcinomas), over 189,000 (52%) among men and over 176,000 (48%) among women. To these few reassuring numbers, the ever-increasing costs that the National Health Service must support for anti-cancer therapies must be added. In Italy, the costs are between 50 and 150 thousand euros per year of care, with an increase estimated at + 17% in 2018. The main goal of research in this area is certainly the innovation of therapy through discovery and the development of new drugs that can provide incremental benefit both to the patient and National Health Service in terms of health and costs. The potential impacts justify the choice of applying SWIM to about twenty different types of tumor, a project that has led to the identification of genes with a key role in neoplastic transformation. Thanks to SWIM it is now possible to identify new potential therapeutic targets for the treatment of different types of cancer. The findings of this study were published in the prestigious Scientific Reports of Nature (Scientific Reports 2017, 7, Article number: 44797). For this work, Dr. Paci also received the Best Poster Award 2016 from the IEEE Technical Committee on Computational Life Science Society (TCCLS) at the Lipari Computational Microbiology and Microbiome-Based Medicine School.

How does SWIM work and how to use it – Getting started and SWIM basics

Giulia Fiscon

IASI-CNR, Rome, Italy

In this section we present the software requirement, the setting up, the Folders architecture, the algorithm steps and the detailed description of the input and output files.

How does SWIM work and how to use it – Usage example

Federica Conte

IASI-CNR, Rome, Italy

In this section we present an application of SWIM to a breast cancer dataset downloadable from The Cancer Genome Atlas.

POSTERS

MOLECULAR SIMULATIONS

Poster n. 1

Can force fields currently used in molecular dynamics simulations

reproduce the fine structure of proteins?

Nicole Balasco, Luciana Esposito, Luigi Vitagliano

Institute of Biostructures and Bioimaging, National Research Council, Naples, Italy

Presenting Author: Luigi Vitagliano. Email: luigi.vitagliano@unina.it

Keywords: Protein backbone geometry, Protein conformation, peptide bond planarity

Several statistical and quantum chemical investigations performed in the last two decades have unveiled that the geometry of protein backbone (bond lengths/angles, dihedral angles and carbon carbonyl pyramidalization) is characterized by a significant variability that is strongly coupled with the local conformation (Berkholz et al. PNAS 2012; Esposito et al. JMB 2005; Balasco et al. Acta Cryst 2017; Improta et al. Acta Cryst D and Proteins 2015; Karplus Proteins 1996). The identification of this fine structure of protein backbone has important implications for protein structure prediction, determination, validation. Therefore, we here evaluated the ability of force fields currently used for modeling and molecular dynamics to reproduce these geometric properties. In particular, we performed simulations on the Ala dipeptide using the different force fields implemented in the GROMACS 4.5.5 package. Our results indicate that these force fields are able only to partially reproduce the variability of bond angles involving non-hydrogen atoms (N, C, and O) of the protein backbone. On the other hand, they are not able to reproduce the conformation-dependent variability (a) of the peptide bond distortions from planarity, (b) of the carbon carbonyl pyramidalization, and (c) of the angles involving the hydrogen atom.

In this scenario, important efforts should be made to improve the accuracy of force fields in emulating these subtle but important features of protein structure.

Poster n. 2

Structural basis for mutations of human aquaporins associated to genetic diseases

Luisa Calvanese (1,2,3), Gabriella D’Auria (1,2,3), Lucia Falcigno (1,2,3), Romina Oliva (4)

1) CIRPeB, University of Naples Federico II, Napoli; 2) Department of Pharmacy, University of Naples Federico II, Napoli; 3) Institute of Biostructures and Bioimaging, CNR, Napoli; 4) Department of Sciences and Technologies, University Parthenope of Naples, Napoli-Italy

Presenting author- Gabriella D’Auria. Email: gabriella.dauria@unina.it

Keywords: homotetramer, trans-membrane channel proteins, water trafficking

Aquaporins are highly selective homotetrameric channel proteins, which facilitate water flux across cellular membranes in a large diversity of organisms. To date, over 30 non-synonymous single nucleotide polymorphisms (nsSNPs) have been characterized for human aquaporins, which are associated to genetic disorders, such as nephrogenic diabetes insipidus, keratoderma, and breast cancer. Most of the above nsSNPs are concentrated on AQP2, an aquaporin located in the kidney collecting duct, where it is responsible for urine concentration, which is indeed the most clinically studied member of the aquaporin family. Few other nsSNPs are instead located on AQP5, identified within cellular membranes of salivary and lacrimal glands, the stomach, duodenum, and other inner organs, and playing a role in the generation of saliva, tears and pulmonary secretions, and on AQP8, found in pancreas and colon, and having an important role in spermatogenesis, fertilization, and the secretion of pancreatic juice and saliva.

Herein, to shed further light on the structure-function relationship in this important protein family, we present results of our investigation of the structural basis for the above genetic mutations. We anticipate that most of the mutations we characterized interfere with the functional aquaporins folding by impairing the correct packing between trans-membrane helices at level of either their tertiary or quaternary structure.

Poster n. 3

Computational analysis of the interactions between arginine and GALT enzyme to rationalize the activity of a putative chaperon on the protein stability

Lucrezia Catapano, Anna Marabotti

Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Italy

Presenting author: Lucrezia Catapano. Email: lucrezia.catapano@gmail.com

Keywords: pharmacological chaperone, stability, classic galactosemia, mutations

Classic galactosemia is a rare genetic metabolic disease causing life-threatening complications in newborns affected and exposed to galactose, and many severe long-term issues that can manifest in adolescence despite a galactose-restricted diet (1). This disease is caused by the impairment of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the second step of the Leloir pathway of galactose catabolism, provoked by more than 330 known variations in the gene coding for this protein. The characterization of the effects of these variations on the enzyme with different “in silico” and “wet” approaches has shown that most of them cause instability in the mutant proteins that tend to aggregate in solution (2,3). The amino acid arginine is a known stabilizer of aggregation-prone proteins, and a study performed in a prokaryotic model of galactose sensitivity has shown that this compound presents a mutation-specific beneficial effect, particularly on the variants NP_000146.2:p.Gln188Arg and NP_000146.2:p.Lys285Asn.

We present a computational study in which we use docking to simulate the binding of arginine to GALT enzyme, in order to understand how this amino acid can interact with this protein and improve its stability. The results of this study is a starting point to rationalize the effects of this compound at molecular level and will constitute the basis for the development of pharmacological chaperones able to counteract the negative effects of mutations on this enzyme.

References:

1) Viggiano et al., Clin Genet. 2017 Apr 4. doi: 10.1111/cge.13030. [Epub ahead of print]

2) Timson D., Gene. 2016;589:133-41.

3) D’Acierno et al., Hum Mutat. 2018; 39: 52-60.

Poster n. 4

What can bioinformatics due for rare diseases? An example from Fabry disease

Valentina Citro (1), Chiara Cimmaruta (1), Ludovica Liguori (2), Maria Monticelli (1), Gaetano Viscido (1), Giuseppina Andreotti (2) and Maria Vittoria Cubellis (1)

1) Dipartimento di Biologia, Università Federico II, Napoli 80126, Italy; 2) Istituto di Chimica Biomolecolare –CNR, Pozzuoli 80078, Italy

A rare disease is any disease that affects a small percentage of the population. There are more than 6000 diseases that, although being individually rare, affect a great number of people on the whole. Finding drugs for each rare disease is very expensive, but bioinformatics can provide timesaving and low-cost tools. A successful example of the benefits offered by bioinformatics, is represented by Fabry disease. Fabry disease is a X−linked lysosomal storage disorder due to mutations in the gene GLA that encodes acid alpha-galactosidase, AGAL. The large genotypic and phenotypic spectrum of the disease [1] makes its diagnosis difficult. It is possible to graduate the severity of missense mutations measuring the flexibility of the affected residues in the structure of AGAL by molecular dynamics [2]. Moving on to therapy, a novel approach is represented by pharmacological chaperones that can be used as oral drugs. They are small chemicals that bind AGAL and stabilize some mutant forms. Screening large numbers of molecules to find high affinity ligands can be preceded by a filtering step of docking in silico [3], thus limiting the number of chemicals that must be bought or synthesized for in vitro testing . Once found the drug, bioinformatic tools can help predict responsive mutations [4] and arrange databases to identify eligible patients [5]. Bioinformatics can help repositioning drugs [6] too. In this case the hits can enter clinical practice very rapidly.

References:

Citro, V., et al., The Large Phenotypic Spectrum of Fabry Disease Requires Graduated Diagnosis and Personalized Therapy: A Meta-Analysis Can Help to Differentiate Missense Mutations. Int J Mol Sci, 2016. 17(12).
Cubellis, M.V., M. Baaden, and G. Andreotti, Taming molecular flexibility to tackle rare diseases. Biochimie, 2015. 113: p. 54-8.
Citro, V., et al., Identification of an Allosteric Binding Site on Human Lysosomal Alpha-Galactosidase Opens the Way to New Pharmacological Chaperones for Fabry Disease. PLoS One, 2016. 11(10): p. e0165463.
Andreotti, G., et al., Therapy of Fabry disease with pharmacological chaperones: from in silico predictions to in vitro tests. Orphanet J Rare Dis, 2011. 6: p. 66.
Cammisa, M., et al., Fabry_CEP: a tool to identify Fabry mutations responsive to pharmacological chaperones. Orphanet J Rare Dis, 2013. 8: p. 111.
Hay Mele, B., et al., Drug repositioning can accelerate discovery of pharmacological chaperones. Orphanet J Rare Dis, 2015. 10: p. 55.

Poster n. 5

Comparative analysis of molecular motions in SIRTUIN2 proteins

Dotolo (1,2,3,) A. Facchiano (1) and A. Pandini (3)

1) Institute of Food Science (ISA-CNR), Via Roma 64, Avellino (Italy); 2) Department of Biochemistry, Biophysics and General Pathology, University of Study of Campania “L.Vanvitelli”, Via de Crecchio 7, 80138 Naples (Italy); 3) Department of Computer Science, Brunel University of London, Uxbridge, Middlesex UB8 PH3 (United Kingdom)

Presenting Author: Serena Dotolo. Email: serena.dotolo@isa.cnr.it

Keywords: Sirtuins, coevolution, molecular dynamics, CobB, comparative analysis, allosteric regulation

Sirt2 is an NAD+-dependent protein deacetylase evolutionarily conserved from bacteria to humans. CobB is a bacterial Sirt2 homologue characterized by a large Rossmann-fold domain and a smaller Zinc-binding domain (critical point for substrate recognition in bacteria). It was previously suggested that selective substrate-binding in CobB is mediated by distal molecular interactions, which mechanism is still unknown.

We here present a computational study of the molecular dynamics of CobB to unveil SIRTUIN2 mechanism of substrate recognition. The approach is based on the combined application of residue coevolution and molecular simulation analysis, to understand if there are mechanisms of allosteric regulation that could explain the role of distal molecular interactions. A discussion of evolutionary properties of interacting residues, functional structural transitions and domain organization is presented. Residues position identified on CobB have been mapped on the human Sirt2 in order to evaluate if the distal molecular interactions to recognize a specific substrate have been conserved. From the profile of the wild-type protein we planned to investigate the dynamics of pathological mutants and the existence of compensatory mutations. This approach could lead to the identification of putative allosteric sites for novel drug-design. Finally, it could be useful to apply this protocol also on human Sirt2, to evaluate if the distal molecular interactions have been conserved or not.

Poster n. 6

Hotspot residues involved in potency and selectivity of the ligand binding

in Nuclear Receptors

D’Ursi P (1), Uggeri M (1), Rovida E (2), Fossa P (3), Milanesi L (1),Orro A (1)

1) Istituto di Tecnologie Biomediche – CNR; 2) Istituto di Ricerca Genetica e Biomedica – CNR; 3) Dipartimento di Farmacia – Università di Genova

Presenting Author: Uggeri Matteo. Email: matteo.uggeri@itb.cnr.it

Keywords: Molecular dynamics, Binding features, Nuclear receptor, Drug design and toxicity prediction

Nuclear receptors (NRs) are members of a large family of ligand-dependent transcriptional regulators of several human biological processes such as development, reproduction and metabolism.

NRs have a ligand binding domain (LBD) involved in ligand binding and they are primary targets of numerous exogenous compounds (drugs and environmental contaminants) with established or potential toxic effects.

Structures of NRs available in PDB provide information about the interaction of natural ligands with NR LBD. In order to obtain a high level description of these interactions, we developed a computational method based on molecular dynamics and molecular mechanics Generalized Born surface area simulations, that will allow to obtain specific binding features. The analyses indicate the hotspot residues and the possibility to find out which ones are involved in the ligand potency and which in the selectivity of the binding. X-ray structures do not give this information, showing only which residues interact with the ligand.

Results reveal that for 14 of the 22 NRs tested, the residues involved in the interaction with the ligand showed in x-ray structure are not the same identified in silico analysis as selective or potency-residues. Moreover, for 15 NRs analysed, we defined an average of 2 selective residues for each NR, but often only one is the potency-residue. This could be an important tool for drug design and exogenous compound toxicity studies of NRs.

Poster n. 7

The variability of backbone geometry as a new tool for protein structure validation

Luciana Esposito (1), Nicole Balasco (1), Amarinder Singh Thind (2), Mario R. Guarracino (2), Luigi Vitagliano (1)

1) Istituto di Biostrutture e Bioimmagini, C.N.R., Napoli, Italy; 2) Istituto di Calcolo e Reti ad Alte Prestazioni, C.N.R., Napoli, Italy

Presenting Author: Luciana Esposito. Email: luciana.esposito@cnr.it

Keywords: 3D structure, protein structure validation, statistical analyses, protein geometry

Proteins are primary actors in all biological processes whose function depends on their three-dimensional structure. The complexity of these macromolecules makes the elucidation of their structure a non-trivial task. Taking into account the enormous number of parameters to be determined, the available experimental data are generally not sufficient for effective structure refinement. Protein structure determination is achieved through the combination of experimental data and a priori information derived from small molecules. In this scenario, the assessment of the validity/quality of 3D models represents a crucial step. Over the years, several widely-used validation approaches have been proposed. Although the application of these protocols has enormously increased the reliability and the accuracy of protein structures, there are several unmet needs in this field. We and others have highlighted that protein structures are endowed with fine features at backbone level (variability of bond lengths, bond angles, and dihedral angles) that have an important impact on their overall structure as well as on their function. By analyzing the entire structural content of the Protein Data Bank, we here show that the detection of this fine structure is strictly correlated with the overall accuracy of individual protein structures. These findings clearly demonstrate that the evaluation of these subtle parameters represents an innovative and valuable tool for protein structure validation.

Poster n. 8

Homology modelling based study of structural properties of Microbial Transglutaminases

Deborah Giordano (1, 2), Angelo Facchiano (1)

CNR-ISA Institute of Food Sciences, Avellino, Italy; 2) Dottorato di Ricerca in “Innovazione e management di alimenti ad elevata valenza salutistica”, Università di Foggia, Italy

Presenting author: Deborah Giordano. Email: deborah.giordano89@libero.it

Keywords: mTGase, protein structure, large-scale protein modelling, 3D-structure comparison

Transglutaminases (TGases) are a family of proteins, present in all organisms, which catalyze different reactions, as transamidation and deamidation of protein glutamines. Biotechnological interest focused on microbial TGases (mTGases), whose physiological role is still unknown, and since ‘90s Streptomyces mobaraensis TGase has become a tool for industrial applications. In the recent years, other mTGases have been investigated to characterize their functional properties.

In the first part of our study, we analyzed thousands of mTGase sequences, defining groups of sequences that differ for amino acid conservation, and potentially differing in 3D structure features. Here, we describe the first results of the homology modelling study performed to obtain reliable models of representative proteins from the different groups, and the observed differences. The most intriguing aspects concern the peculiarity of the catalytic site, commonly considered to have three key amino acids. The triad is differently positioned within the amino acid sequences, and in some group of sequences, the three key amino acids are not conserved. Open points remain to be assessed, as for example the correct annotation of these sequences as TGases, the role of the three key amino acids to catalyze the typical reaction of TGases, the possibility that subtle differences may exist in the catalytic mechanism of these bacterial proteins.

Poster n. 9

Electrostatic fingerprints in avian influenza hemagglutinins

Irene Righetto (1), Alireza Heidari (2), Francesco Filippini (1)

1) Department of Biology, University of Padua, via U. Bassi 58/B, 35131 Padova, Italy; 2) Department of Comparative Biomedicine and Food Science, University of Padua, viale dell’Università 16, 35020, Legnaro (PD), Italy

Presenting Author: Irene Righetto. Email: irene.righetto@bio.unipd.it

Keywords: molecular evolution, protein surface fingerprints, protein electrostatics

Genome variation is very high in influenza A viruses. However, viral evolution and spreading is strongly influenced by immunogenic features and capacity to bind host cells, depending in turn on the two major capsidic proteins. We performed a comparative structural analysis of haemagglutinin as a model to gain functional insights on surface regions possibly crucial to antigenicity and cell binding. We found that type-specific electrostatic and hydropathy fingerprints can be inferred when comparing surface properties of haemagglutinin subregions, monomers and trimers. Intriguingly, electrostatic variation at the receptor binding domain surface relates to branching and evolution of still circulating H5N1 clades from no longer circulating ones. When H9 viruses from wild birds and poultry are compared, once again electrostatic fingerprints are found to relate to viral evolution and in particular to group specific variation in well-known hemagglutinin sites involved in the modulation of immune escape and host specificity. Recently and studying further avian influenza viruses, we found that electrostatic signatures evolved in parallel evolved are shared among avian and mammalian (including human) hosts.This work suggests that integrating structural and sequence comparison may boost investigation on trends and relevant mechanisms in viral evolution and may provide a tool for shedding light on antigenic drift and pandemic host shift.

Poster n. 10

Prediction of the effects of amino acid mutations on protein stability:

an analysis of available tools and their reliability

Bernardina Scafuri (1), Angelo Facchiano (1), Anna Marabotti (2)

1) CNR-ISA, National Research Council, Institute of Food Sciences, Avellino, Italy; 2) Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano (SA), Italy

Presenting author: Bernardina Scafuri. Email: bernardina.scafuri@isa.cnr.it

Keywords: database, mutations, protein stability, predictors

In our studies, we are interested in evaluating the effects of amino acid point mutations in proteins involved in human diseases. This is aimed to the investigation of molecular bases of human diseases, in particular for rare diseases. Loss of protein function because of amino acid variation may be caused either by a direct effect on the functional site or by the destabilization of the protein structure, and in the latter case it may be useful to evaluate the effect of a variation on protein stability by means of predictors.

We have recently created a public database named Galactosemia Proteins Database (http://protein-variants.eu/galactosemia/), devoted to galactosemia, a rare disease due to variations affecting the genes coding for the three enzymes involved in the galactose metabolism. Our database provides information about the structural features of the wild type enzymes and of their known missense variations. Among the structural aspects investigated, protein stability is of strong interest in order to define if the variation can affect the function. During the preparation of Galactosemia Proteins Database, we used seven different predictors in order to infer the effects of each variation on protein stability. By comparing the results obtained for all the variations investigated, we evaluated their performances, their reliability and the suitability for the practical needs of our work, and defined a strategy to use the most reliable ones, with a consensus scheme.

Here we present the results of this comparative work and some considerations about the predictors used that could be of general interest for the application of these tools to a specific problem.

SESSIONS: APPLICATIONS IN GENOMICS AND

APPLICATIONS IN MEDICINE

Poster n. 11

Improvement of the Ion Torrent PGM sequencing workflow of a gene panel for next generation sequencing of DNA samples from patients

Giancarlo Castellano (1), Eva Gonzalez (1), Alba Roset (1), Eva Fernandez (1), Ricard Isanta (1), Wladimiro Jimenez (2), Pedro Jares (1)

1) Core de Biologia Molecular, Hospital Clínic Universitari, Barcelona, Spain; 2) Biochemistry and Molecular Genetics Service, Hospital Clínic Universitari, Barcelona, Spain.

Presenting author: Giancarlo Castellano. Email: castellano@clinic.ub.es

Next-generation sequencing is currently used in clinical settings for translational biomarker profiling and clinical research studies including inherited diseases and cancer. The Ion Torrent PGM NGS technology in germline mutation identification is a easy to use, fast and cheap approach. We are routinely using the Oncomine Focus Assay in our Service that is able to interrogate 22 genes, 92 Amplicons for detection of genetic variants in tumor sample from patients. The PGM technology use the DNA Barcodes system to provide flexibility and high-throughput capabilities in sequencing and to

significantly increase scale while reducing costs by allowing to pool multiple library preparations in a single flow cell lane. We developed an optimized pipeline to analyze the sequencing results to detect genomic variant based on the combination of two different barcodes for the same sample like in a duplicated assay. This new method allows to increase the sensibility and specificity of the sequencing test with the consequent reduction of false positive and false negative detection of genomic variants. Moreover this new method allow to improve the detection of variant in regions with not optimal read coverage. Therefore we suggest an adjusted workflow for clinical diagnostic application that can help to detect genomic variants at low cost and with an improved performance with respect to the conventional procedure.

Poster n. 12

ClusterScan: a tool to discover and annotate genomic clusters

Massimiliano Volpe (1), Marco Miralto (2), Stefano Gustincich (3), Remo Sanges (1)

1) Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Napoli, Italy; 2) Department of Research Infrastructures for marine biological resources, Stazione Zoologica Anton Dohrn, Napoli, Italy; 3) Area of Neuroscience, SISSA, Trieste, Italy

Presenting Author: Massimiliano Volpe. Email: mas.volpe@gmail.com

Keywords: clusters, co-localization, function, pipeline

A genomic cluster is a broad term used to define a group of genes or other features on a genome that typically share a common characteristic such as involvement in the same pathway, function, co-expression or co-localization. Annotation is the process of finding and locate individual genes or other features on the genome and attribute a function when possible. The possibility to exactly know where a feature is located on the genome, makes it possible to recognize clusters based on feature co-localization. However, the large landscape of bioinformatics tools lacks a program to identify features which are spatially close and that share a common function.

Here we present ClusterScan, a tool to search for clusters of features starting from annotations. It allows the user to parse an annotation file and produces a bed formatted file storing clusters coordinates and composition in output. ClusterScan needs an additional two-columns file storing the feature ids and corresponding functional annotations in order to drive the cluster search, combining localization with functional information. This annotations can include GO, KEGG, Pfam or any kind of categorical information associable to given genomic intervals. Since the cluster concept is not well defined, the number of features in a cluster and the distance between them is not exactly established. For these reasons ClusterScan is highly flexible and configurable by the user and implements two different algorithms to call clusters.

Poster n. 13

Integration of transcriptomic data in genome scale metabolic networks:

a computational approach to study complex diseases

Ilaria Granata, Enrico Troiano, Mara Sangiovanni, and Mario R. Guarracino

ICAR-CNR, Via Pietro Castellino 111, 80131, Napoli, Italy

Presenting Author: Ilaria Granata. Email: ilaria.granata@icar.cnr.it

Keywords: Genome scale metabolic models,Data integration,Complex diseases,Transcriptomic data
The vast majority of common diseases are caused by a combination of genetic, environmental, and lifestyle factors. The proper understanding of these complex diseases requires a holistic approach. A widely used practice involves the use of GEnome-scale metabolic Models (GEMs) as scaffolds for constraint based mathematical methods aimed at predicting metabolic fluxes in organisms. GEMs represent the metabolic structure of a cell/organism in terms of chemical reactions, and, given the annotation of gene-protein-reaction (GPR) relationships, they have become a suitable tool for the integration of omics data. Here we propose to use a recently published integrative method, based on the application of a purely data-driven objective, to get new insights into the mechanisms underlying obesity and postmenopausal breast cancer relationship. To this aim, we integrated expression data of samples grouped by weight and tumoral subtype (GSE78958) into a GEM of human adipocyte (iAdipocytes1809). From the intersection of the different predicted fluxes we extracted the dysregulated reactions specific to each group. Most of the reactions with different flux rates were associated to cellular trafficking. It is well known that the intracellular accumulation of certain metabolites might have oncogenic effects. These preliminary findings highlight the importance of data integration to unravel the relationship between genotype and phenotype in complex and interrelated diseases.

Poster n. 14

Circulating microRNAs expression profiles are associated with childhood obesity:

results of the I.Family Study

Giuseppe Iacomino (1), Paola Russo (1), Pasquale Marena (1), Fabio Lauria (1), Antonella Venezia (1), Pasquale De Luca (2), Wolfgang Ahrens (3), Ronja Foraita (3), Kathrin Günther (3), Stefaan De Henauw (4), Lauren Lissner (5), Dénes Molnár (6), Luis A Moreno (7), Michael Tornaritis (8), Toomas Veidebaum (9), and Alfonso Siani (1).

1) Institute of Food Sciences, National Research Council, ISA-CNR, Italy; 2) Stazione Zoologica Anton Dohrn, Italy; 3) Leibniz-Institute for Prevention Research and Epidemiology, BIPS, Germany; 4) University of Ghent, Belgium; 5) Sahlgrenska Academy at the University of Gothenburg, Sweden; 6) University of Pécs, Hungary; 7) University of Zaragoza, Spain; 8) Research and Education Institute of Child Health, Cyprus; 9) National Institute for Health Development, Estonia.

Presenting Author: Giuseppe Iacomino. Email: piacomino@isa.cnr.it

Keywords: Circulating miRNAs, obesity, biomarker

About ten years ago, the World Health Organization indicated the increasing prevalence of overweight and obesity worldwide as a challenge for public health, due the adverse consequences associated with obesity. Epidemiological studies established a firm association between an elevated BMI and chronic conditions such as diabetes, dyslipidemia, hypertension, heart disease, and cancer. Omic researches demonstrated that changes in miRNA profiles of various tissues correlate with obesity. Recent studies showed a remarkable stability of miRNAs in blood making them suitable as biomarkers for a variety of diseases. Aim of the research was to characterise the profiles of circulating miRNAs in plasma samples of overweight/obese and normal weight children from the European cohort of the I.Family project (www.ifamilystudy.eu). Results confirmed that a set of miRNAs was deregulated. Analysis are in progress to investigate differences in expression patterns and nutritional, anthropometric, and biochemical variables in subgroups. Molecular interactions of obesity-associated miRNAs were predicted using miRPath which achieves advanced pipelines, such as hierarchical clustering of miRNAs and pathways based on the levels of their interactions. MiRNA targets were predicted by the DIANA-microT-CDS algorithm or even experimentally validated miRNA interactions derived from DIANA-TarBase. Predicted or validated interactions were subsequently combined by merging and by using meta-analysis algorithms.

Poster n. 15

A Genome-Wide Association Study identifies new loci associated with sweet and fat preferences: results from the I.Family study

Fabio Lauria (1), Marco Miele (1), Stefaan De Henauw (2), Carmen Dering (3), Antje Hebestreit (3), Monica Hunsberger (4), Hannah Jilani (3), Jaakko Kaprio (5), Dénes Molnár (6), Luis A. Moreno (7), Teemu Palviainen (5), Toomas Veidebaum (8), Alfonso Siani (1), Paola Russo (1).

1) Institute of Food Sciences, National Research Council, Avellino,Italy; 2) Department of Public Health, Ghent University, Ghent, Belgium; 3) Leibniz Institute for Prevention Research and Epidemiology- BIPS, Bremen, Germany; 4) Department of Epidemiology and Public Health (EPSO), Institute of Medicine, The Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden; 5) Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland; 6) Department of Paediatrics, Medical Faculty, University of Pécs, Hungary; 7) Growth, Exercise, Nutrition, and Development (GENUD) research group, University of Zaragoza, Zaragoza, Spain; 8) Department of Chronic Diseases, National Institute for Health Development, Tallinn, Estonia.

Presenting Author: Fabio Lauria. Email: flauria@isa.cnr.it

Keywords: GWAS, sweet preference, fat preference.

Food preferences influence dietary choices, nutrition, and may have a possible role in the development of obesity and related disorders. The development of food preferences begins in the perinatal period and continues across the life course. This development involves a complex interplay of genetic and environmental factors. To identify genes associated with sweet and fat preferences, we conducted a genome-wide association study (GWAS) on the children and adolescents population participating in I.Family study. 1,513 participants from six countries (Estonia, Germany, Hungary, Italy, Spain, and Sweden) were genotyped using the Affymetrix UK Biobank Axiom 96-Array. To assess food preferences a “Food and Beverage Preference Questionnaire” was developed. Scores for sweet and fatty preferences were calculated and tested for associations with GWAS loci, controlling for age, sex and country of origin. We identified 5 novel sweet preferences-associated loci and 1 locus associated with fat preferences. The newly identified loci explain 9% of the phenotypic variance for sweet preference and 2% for fat preferences. However, including also SNPs below GWAS significance threshold (p-values = 5×10-5) the explained variance increases to 44% and 29% respectively, indicating that a complex network of genes may be involved in food preferences. For the first time we showed that, variants in PARD3B, MYOZ2, and RGS7 are associated with sweet preference and variants in SH3D19 with fat preference.

SESSION: METAGENOMICS

Poster n. 16

Going deeper in deep sequencing: looking for recombinants molecules

Chiara Colantuono (1), Stefano Mazzoleni (2), Maria Luisa Chiusano (1,2)

Presenting Author: Chiara Colantuono. Email: chiara.colantuono@gmail.com

Keywords: metagenomics, NGS, recombinants

Natural recombination and horizontal gene transfer, in eukaryotes and prokaryotes, are well known mechanisms that are at the base of genetic variation that lead to species evolution. Accordingly, identifying recombinant molecules may be a key issue. Results from genomics, metagenomics and transcriptomics analyses offer useful datasets to investigate this phenomena that may be related to technical issues or natural events. We set up a bioinformatics pipeline in order to identify recombinants molecules from genomics, metagenomics or transcriptomics shotgun high throughput sequencing experiments. The pipeline considers different user defined options. Specifically reads may be aligned versus a reference database resulting in partial coverages. Those selected reads (that in general in canonic metagenomics or transcriptomics analyses are discarded) are further analyzed to get information on possible components from different source species. We applied our pipeline using as test case an Arabidopsis thaliana rhizosphere metagenomics dataset to investigate on possible recombination with the plant genome. From our analysis, we identified that ~3% of the reads in the sample matched with Arabidopsis and, out of those, ~43% of the reads partially matched the plant (coverage 20%and<80%). Moreover, the 6% of this reads matched also an alternative species, letting us to identify the abundance of the species which shared recombination with the plant DNA.

Poster n. 17

Metagenomic and metatranscriptomic analysis of bacteria from acid mine ecosystem

Jakub Ridl (1), Lukas Falteisek (2), Jan Paces (1), Hynek Strnad (1), Ivan Cepicka (2), Cestmir Vlcek (1)

1) Department of Genomics and Bioinformatics, Institute of Molecular Genetics of the ASCR, Prague, Czech Republic; 2) Department of Zoology, Faculty of Science, Charles University in Prague, Czech Republic

Presenting Author: Jakub Ridl. Email: ridlj@img.cas.cz

Keywords: metagenomics, metatranscriptomics, acidophilic bacteria, mine ecosystem

Although bacterial cells are not visible to the naked eye, some bacteria can form macroscopic biofilm structures. Long water streamers, gelatinous stalactites or mineralized snottites found in acidic ecosystems around the world have attracted attention of many researchers in the past decade. Here we present metagenomic and metatranscriptomic analyses of a stalactite biofilm recovered from extremely acidic (pH _ 3) habitat of the abandoned ore mine in Zlate Hory (Czech Republic). We applied 16S rRNA amplicon sequencing, shotgun DNA sequencing and RNA-seq followed by bioinformatic data analyses. This combined approach allowed us to reconstruct taxonomic composition of the microbial community, to assemble nearly complete draft genome sequences of the two most abundant bacterial strains, and to analyze expressed genes of the bacteria. Based on our data, the biofilm was formed by a taxonomically simple community predominated with two bacteria of genus Ferrovum and Acidithiobacillus. Both bacteria possess a unique combination of features and are fully adapted to the acid mine ecosystem. They both use carbon dioxide fixation as a carbon source and iron oxidation as a source of energy. In addition, the Acidithiobacillus strain can oxidize sulfur to obtain energy and uses urea as an alternative source of nitrogen enabling partially separated niches. The most expressed genes of the bacterial community are involved in iron oxidation and in low pH and oxidative stress adaptations.

Poster n. 18

Metagenomic approaches to unravel the microbiome of deep-sea loriciferans

Michael Tangherlini (1), Alfonso Esposito (2), Roberto Danovaro (1), Maria Luisa Chiusano (1,3).

1) Stazione Zoologica di Napoli “Anton Dohrn”; 2) CIBIO – Centre for Integrative Biology – Trento; 3) Dipartimento di Agraria, Universita’ degli Studi di Napoli Federico II

Presenting Author: Michael Tangherlini. Email: michael.tangherlini@szn.it

Keywords: metagenomics, bioinformatics, microbiome, metazoa, meiofauna

Metazoans belonging to the phylum Loricifera inhabit different marine benthic ecosystems, including deep-sea hypersaline anoxic basins (DHABs). Organisms in these latter environments are able to live in anoxic conditions, although the mechanisms underlying this adaptation are yet unclear. Cell-like structures were identified during electron microscopy investigations of specimens picked from DHABs, suggesting host-microbiome associations involved in the adaptation. In the present work, five specimens belonging to two different genera (Pliciloricus and Spinoloricus) of Loricifera from ecosystems characterized by contrasting oxygen concentrations were investigated to depict loriciferan microbiomes components and functions by suitable bioinformatic strategies. Our results highlighted that the loriciferan microbiomes are characterized by oxygen-related and species-specific associations. In particular, members of the Gammaproteobacteria and Basidiomycota classes were associated with specimens from anoxic environments, whereas members of the Alphaproteobacteria class were associated with specimens belonging to the Pliciloricus genus. Moreover, prokaryotes inhabiting Loricifera from anoxic environments exhibited a higher abundance of proteins related to xenobiotic metabolism. In addition, the microbiomes associated with the specimens were significantly different from those inhabiting the surrounding sediments, indicating host specificity and possible co-evolution.

Poster n. 19

GLOSSARY: the GLobal Ocean 16S Subunit web Accessible Resource

Tangherlini M. (1), Miralto M. (1), Dell’Anno A. (2), Corinaldesi C. (3), Danovaro R. (1,2), Chiusano M.L. (1,4)

1) Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy; 2) Dipartimento di Scienze della Vita e dell’Ambiente, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy; 3) Dipartimento di Scienze e Ingegneria della Materia, dell’Ambiente ed Urbanistica, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy; 4) Dipartimento di Agraria, University Federico II of Naples, Via Università 100, 80055 Portici (NA), Italy

Metagenomic analyses carried out as part of the TARA Oceans expedition resulted in the production of massive sequence data, which were made available as a reference resource for the whole scientific community.

We processed the 16S miTAGs produced by the TARA consortium to obtain value added information embedded in the SILVA collection (release 123) and organized the data using a next-generation NoSQL “schemaless” database supporting native geo-spatial queries and allowing for flexible organization of large data amounts.

The platform, designed to be expanded also with novel collections, currently offers results on sequence taxonomic affiliations and on geo-spatial-distributions of clusters of related sequences, also accessible through an interactive and user-friendly web-application implemented by the BIOINforMA (BIOINformatics for MArine biology) service in the Stazione Zoologica Anton Dohrn.

SESSION: OMICS AND MULTI-OMICS

Poster n. 20

Using deep learning for supervised feature selection in ovarian cancer detection

d’Acierno, F. Nazzaro

ISA – CNR, via Roma 64, 83100 Avellino, Italy

Presenting Author: A. d’Acierno. Email: dacierno.a@isa.cnr.it

Keywords: Deep neural networks, ovarian cancer, feature extraction, classification
In mass spectrometry, a large number of peaks is analyzed to realize efficient predictive and screening protocols. Mass spectra are high dimensional data and the risk of over-fitting is pervasive, so that well suited methods for feature extraction are needed. It is worth to be noted that, if the feature selection step is external to the cross validation (i.e. the selection procedure uses all the data and the performance evaluation by cross validation is performed just for the classification phase), then the obtained results may be severely biased due to the so called selection bias effect, and perfect classification could be obtained even for completely fake datasets.

DANNs (Deep Artificial Neural Networks) are composed by multiple processing layers to learn, typically using supervised learning algorithms, representations of data with multiple levels of abstractions. For these models, that have improved the state-of-art in image and signal classification, the key concept is that features to be used in the classification layers are automatically extracted by means of convolutional and subsampling neural networks.

In this paper we use a DANN to classify spectra obtained from analysis of serum. We use a database containing 200 low-resolution spectra divided in 100 cancer and 100 control samples. The used network is composed by 4 levels and is trained using a back-propagation learning algorithm. The current overall accuracy obtained over a 10-fold cross validation is 95%.

Poster n. 21

Multiomic statistical approach to celiac disease

Eugenio Del Prete (1,2,3), Angelo Facchiano (2), Pietro Liò (3)

1) Department of Sciences, University of Basilicata, Viale dell’Ateneo Lucano 10, 85100, Potenza (Italy); 2) Institute of Food Sciences, National Research Council, Via Roma 64, 83100, Avellino (Italy); 3) Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD (UK)

Presenting author: Eugenio Del Prete. Email: eugenio.delprete@isa.cnr.it

Keywords: gluten-sensitive enteropathy; gene differential expression; gene ontology; gene set enrichment analysis

Celiac disease is a complex, multifactorial pathology, which related nutrition, immunological response and genetic factors. Nowadays, the number of celiac people is markedly increasing and a gluten free diet is the only proven treatment, even if it is not always successful.

Many evidences are directed towards studies about comorbidities, such as type1 diabetes, irritable bowel syndrome, cardiomyopathy, inflammatory myopathies, and other pathologies directly or indirectly correlated to celiac disease. Furthermore, recent studies are performed about the identification of novel antibody gluten-related and the relationship with the gut microbiota. The detailed understanding of the disease pathway and the research of biomarkers are important, and bioinformatics tools can help in both the tasks.

The aim of this preliminary work is to use the Gene Set Enrichment Analysis in order to find some evidences, by means of the integration of gene expression data and Gene Ontology terms. A survey on celiac disease data available online is performed, providing some compared results, which gives a direction towards the cytokine signaling pathway and the autoimmune response. The entire workflow is performed in R environment.

A future direction of this work is the comparison of the genes and the ontology terms among different pathologies, searching for common features that can strengthen the knowledge on comorbidities, helpful for the screening, diagnosis and treatment of the celiac disease.

Poster n. 22

HiCeekR: a novel Shiny app for the analysis of Hi-C data

Lucio Di Filippo (1, 2), Miriam Gagliardi (2), Maria Rosaria Matarazzo (2), Claudia Angelini (3)

(1) Dipartimento di Biologia, Università degli Studi di Napoli Federico II; (2) Institute of Genetics and Biophysics “A. Buzzati Traverso“, Consiglio Nazionale delle Ricerche; (3) Istituto per le Applicazioni del Calcolo “Mauro Picone”, Consiglio Nazionale delle Ricerche

Presenting author: Lucio Di Filippo. E-mail: difilippolucio@gmail.com

Hi-C technique combines the power of the Next generation Sequencing technologies with the chromosome conformation capture approaches to study 3D chromatin conformation in genome-wide scale. Although such technique is still quite recent there are already many tools available for pre-processing and analyzing Hi-C data, identifying chromatin loops, topological associating domains and compartment A/B. However, only few of them provide a complete Hi-C data analysis pipeline or allow the integration with other omic data types. Moreover, most of the available tools are command line; therefore they are not designed for user-friendly data exploration.

HiCeekR is a novel R package that allows easily performing a complete Hi-C data analysis through a Graphical User Interface. It has been implemented using Shiny package and integrates several other R packages for visualization and Hi-C data analysis. With HiCeekR the user can perform different analytic strategies and visualize the results by dynamic drag-and-drop plot exploration. For a better understanding of the chromatin structure, HiCeekR can integrate Hi-C with data from different omic types such as transcriptomics (i.e., RNA-Seq) and epigenomics (ChIP-Seq). Moreover, HiCeekR has a modular structure that is easily expandable and it allows integration of third parties functions. We will illustrate the capability of HiCeekR using a case study on lymphoblastoid cell line for which several Hi-C datasets are available.
Poster n. 23

Characterization of epithelial-mesenchymal transition intermediate/hybrid phenotypes associated to resistance to EGFR inhibitors in non-small cell lung cancer cell lines

Valentina Fustaino (1,2)*, Dario Presutti (1)*, Teresa Colombo (2), Beatrice Cardinali (1), Giuliana Papoff (1), Rossella Brandi (3), Paola Bertolazzi (2,4), Giovanni Felici (2)**, Giovina Ruberti (1)**

1) Institute of Cell Biology and Neurobiology, National Research Council (IBCN-CNR), Monterotondo, Rome, Italy; 2) Institute for Systems Analysis and Computer Science “Antonio Ruberti” National Research Council, (IASI-CNR), Rome, Italy; 3) Genomics facility of the European Brain Research Institute, “Rita Levi-Montalcini” (EBRI), Rome, Italy; 4) SYSBIO Center for Systems Biology, Milan, Italy.

* These authors contributed equally to this work; ** Senior authors contributed equally to this work

Presenting Author: Valentina Fustaino. Email: valentina.fustaino@ibcn.cnr.it

Keywords: drug-resistance, EMT, hybrid phenotypes, gene expression profiles, clustering, Support Vector Machines

Increasing evidence points to a key role played by epithelial-mesenchymal transition (EMT) in cancer progression and drug resistance. In this study, we used wet and in silico approaches to investigate whether EMT phenotypes are associated to resistance to target therapy in a non-small cell lung cancer model system harboring activating mutations of the epidermal growth factor receptor. The combination of different analysis techniques allowed us to describe intermediate/hybrid and complete EMT phenotypes respectively in HCC827- and HCC4006-derived drug-resistant human cancer cell lines. Interestingly, intermediate/hybrid EMT phenotypes, a collective cell migration and increased stem-like ability associate to resistance to the epidermal growth factor receptor inhibitor, erlotinib, in HCC827 derived cell lines. Moreover, the use of three complementary approaches for gene expression analysis supported the identification of a small EMT-related gene list, which may have otherwise been overlooked by standard stand-alone methods for gene expression analysis.

Poster n. 24

A bioinformatics framework to identify cell subpopulations from

bulk gene expression data of cancer samples.

Andrea Grilli (1, 3), Sara Castellano (1), Cristina Battaglia (2), Silvio Bicciato (1)

1) Department of Life Sciences, Center for Genome Research, University of Modena and Reggio Emilia, Modena, Italy; 2) Department of Medical Biotechnology and Translational Medicine (BIOMETRA), University of Milan, Via F.lli Cervi 93, 20090, Segrate, Italy; 3) PhD Program of Molecular and Translational Medicine, Department of medical Biotechnology and Translational Medicine, University of Milan, 20090 Segrate, Italy.

Presenting Author: Andrea Grilli. Email: andrea.grilli2@unimore.it

Keywords: Deconvolution, breast cancer, samples heterogeneity, cellular subtypes, bulk.

The expression levels of biological samples are affected by the intrinsically heterogeneous cell and tissue composition. Nevertheless, when analyzing transcriptional profiles, each sample is generally evaluated in bulk without considering the presence of multiple subpopulations. This limitation might be extremely critical when analyzing gene expression profiles from cancer samples, where dissecting the mixed cell population could shed light on the intratumoral heterogeneity and on the molecular mechanisms shaping different cancer behaviors.

We built on Cibersort [https://cibersort.stanford.edu/], a recently published deconvolution algorithm (PMID:25822800), to design a framework for the identification of cellular subpopulations from bulk gene expression of cancer samples. After testing the efficacy of the approach on mouse gene expression data, we applied the framework to a set of clinically-defined Triple Negative Breast Cancer (TNBC). First, we defined a novel gene signature starting from 55 samples characterized as Luminal A, Luminal B, Her2, and TNBC subtypes by immunohistochemistry. Then, we applied the gene signature to quantify the fractions of each subtype in a set of 357 TNBC samples.

Results confirmed that 71.4% of samples was indeed enriched in the TNBC-like cell subpopulation, but also evidenced that remaining 28.6% of samples were not TNBC-like. We are currently evaluating the correlation between the detected subpopulation heterogeneity and the clinical outcome.

Poster n. 25

Single-sample transcriptional classification of colorectal cancer

Claudio Isella, Jessica Giordano, Enzo Medico

Università degli studi di Torino

Presenting Author: Claudio Isella. Email: claudio.isella@ircc.it

Keywords: genomic classifier, single sample, elastic net

Recently, unsupervised analysis of transcriptomics data provided new insights in cancer biology, revealing high intra- and inter-tumoral heterogeneity. In the field of colorectal cancer, we recently identified 5 CRC Intrinsic Subtypes (CRIS), that could be integrated with clinical parameters to predict disease outcome. However, in accordance with current methods, the CRIS classifier assigns each sample to one subtype and suffers from data normalization and cohort representations. These limitations de facto hamper clinical application of transcriptional classifiers.

To circumvent these constrains, we conceived a pipeline to develop a “single-sample” classifier that individually evaluates specimens and attributes a membership to each CRIS subtypes via logistic regression model on pairwise gene expression profiles.

The pipeline consists of: 1) select genes whose expression is most variable among subtypes; 2) evaluate pairwise gene expression ratio; 3) Model pairwise scores with elastic net to prioritize and weigh the most informative gene pairs to predict CRIS subtypes – this approach provides a classification algorithm in which each subtype is assigned with a prediction score to a sample; 4) evaluate reproducibility of the new algorithm with the original CRIS and estimation of the clinical information conveyed.

This approach provides a tool bringing novel biologically and clinically relevant molecular taxonomies towards clinical applications.

Poster n. 26

miRTissue: a web service for characterizing the type of miRNA-target interaction

in specific tissues

Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Alfonso Urso

CNR-ICAR, National Research Council of Italy, Palermo, Italy

Presenting Author: Massimo La Rosa. Email: massimo.larosa@icar.cnr.it

Keywords: miRNA, miRNA-target interaction, tumor tissue, web service

The regulatory role of microRNAs (miRNAs) is evident in both physiological and pathological processes such as tumors. Bioinformatics approaches, as miRNA-target interaction prediction tools, give a great contribute in helping the comprehension of molecular mechanisms involved in these pathologies. However, there is a lack in defining these molecular interactions in a tissue-specific context. We present a web service, called miRTissue, based on validated miRNA-target interactions, in order to: characterize the most significant interactions related to a specific tissue; understand the mechanism of action of the interaction, that is messenger degradation or translation inhibition. Expression values of both miRNA and mRNA, belonging to 15 types of tumor tissues, were downloaded from TCGA database. miRNA-target interactions are characterized with respect to a particular tissue, by computing, using the global test, the correlation between miRNA and corresponding mRNA (target) expression. The sign of the correlation can provide an insight about the presence, or less, of an interaction in a specific tissue. Anti-correlation means the likely existence of an interaction (degradation). In case of correlation, we considered the protein expression value (if any) of the interested mRNA, because an anti-correlation between gene and protein expression can likely be related to translation inhibition of the target. The web service is available at http://tblab.pa.icar.cnr.it/mirtissue.htm

Poster n. 27

The ciliate Tetrahymena thermophila as a case study for the construction of a meta-model multi compartment resource and the analysis of free-living ciliate metabolism.

Alessio Mancini (1), Filmon Eyassu (2), Maxwell Conway (3), Annalisa Occhipinti (3), Sandra Pucciarelli (1), Claudio Angione (2) , Pietro Liò (3)

1) School of Biosciences and Veterinary Medicine, University of Camerino, Italy; 2) School of Computing, Teesside University, UK; 3) Computer Laboratory, University of Cambridge, UK

Presenting Author: Alessio Mancini. Email: bio.mancini@gmail.com

Keywords: system biology, eukariotyc unicellular organism, metabolic network

Ciliates are complex unicellular eukaryotes of presumably monophyletic origin, with a phylogenetic position at an equal distance from plants and animals. Ciliates possess a cell compartmentalisation that ensures the optimal environment for each specific reaction and to perform in a single cell all the functions of a pluricellular organisms, such as locomotion, feeding, digestion, and sexual processes. The cell complexity of ciliates and the large core of reactions they perform make necessary a new meaningful approach to describe their multi omic metabolism. Here we describe state of art, challenges and potentialities of ciliates metabolic modelling and we present a first draft of an open ciliate metabolism project, named MetCiliates. In particular, we report the freshwater ciliate Tetrahymena thermophila metabolic network reconstruction and we provide the methodological basis for extending the T. thermophila reaction network to other ciliates. The metamodels that can be obtained by this resource may represent a new meaningful approach to describe the multi omic metabolism of all ciliates. An improved understanding of ciliate metabolism could be of great importance for the study of genotype-phenotype relationship, bioremediation, environmental markers, population genetics, and in the study of cilia and BBsome (Bardet–Biedl syndrome) related diseases.

Poster n. 28

A Sparse Learning-Based Approach for class-specific feature selection

Ornella Affinito (1,4), Angelo Ciaramella (2), Sergio Cocozza (1), Gennaro Miele (3), Antonella Monticelli (4), Davide Nardone (2), Domenico Palumbo (1), Antonino Staiano (2)

1) Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Università degli Studi di Napoli Federico II; 2) Dipartimento di Scienze e Tecnologie, Università degli Studi di Napoli Parthenope; 3) Dipartimento di Fisica , Università degli Studi di Napoli Federico II; 4) Istituto di Endocrinologia ed Oncologia Sperimentale (IEOS) “Gaetano Salvatore”, CNR, Napoli

Presenting Author: Davide Nardone. Email: antonino.staiano@uniparthenope.it

Keywords: class-specific feature selection, sparse dictionary learning, microarray data

Feature selection (FS) plays a key role in computational biology, making it possible to treat models with fewer variables, which in turn are easier to explain and might speed the experimental validation up by providing valuable insight into the importance and role of variables.

Here, we propose a novel procedure for FS conceiving a two-steps approach. Firstly, a sparse coding based learning technique is used to find the best subset of features for each class of the training data. In so doing, it is assumed that a class is represented by using a subset of features, called representatives, such that each sample, in a specific class, can be described as a linear combination of them. Secondly, the discovered feature subsets are fed to a class-specific feature selection scheme, to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built by training a classifier, one for each class on its own feature subset, i.e., the one discovered in the previous step, and a proper decision rule is adopted to compute the ensemble responses.

To assess the effectiveness of the proposed FS approach, a number of experiments have been performed on benchmark microrarray data sets in order to compare the performance to several FS techniques from literature. In all cases, the proposed FS methodology exhibits convincing results, often overcoming its competitors.

Poster n. 29

A Bioinformatic Framework to Assess the Transcriptomic Response

of Species to Environmental Toxins

Siavash Nazari (1), Mehrdad Hajibabaei (2)

1) Biodiversity Institute of Ontario, University of Guelph, Canada; 2) Department of Integrative Biology & Biodiversity Institute of Ontario, University of Guelph, Canada

Presenting Author: Siavash Nazari. Email: snazari@uoguelph.ca

Keywords: Bioinformatics, Transcriptomics, Computational Biology, Metagenomics, Comparative Genomics

The current ecotoxicological approach to identify sites of biological concern uses measures like median lethal dose and lethal concentration analysis which do not link the effects of toxins to any other level of biological organization. These methods, would not inform of the presence of hazardous chemicals before a significant environmental damage has already been dealt. Being motivated by the idea that essential pathways and genes are conserved across the taxa, we present a framework that incorporates transcriptomic data of inhabitant species to monitor the environment. This framework mines chemical-gene interaction data from publicly available databases like the Comparative Toxicology Database, retrieving an initial set of affected genes in the input organisms. Next, it retrieves translated protein sequences related to the these genes and performs homology-based searches in the genome of target organisms and clusters the results to obtain homolog groups. Lastly, retrieving supplementary data, like pathways and Gene Ontology terms, it learns a Bayesian Network to demonstrate meaningful correlations between shared pathways and input chemicals. We ran the pipeline on a group of phylogenetically distant taxa, namely Mus musculus, Danio rerio, Drosophila melanogaster and Caenorhabditis elegans, with input chemical groups of heavy-metals and dioxins. The highly affected pathways suggested by our pipeline correspond with previous studies on toxic effects of these chemicals.

Poster n. 30

A strategy combining a single variant and gene-based approach in multiplex family to detect rare variants associated with Rheumatoid Arthritis

Maëva Veyssiere (1), Javier Perea (1), Laetitia Michou (2), Anne Boland-Auge (3), Vincent Meyer (3), Jean-François Deleuze (3), François Cornelis (4), Elisabeth Petit-Teixeira (1), Valérie Chaudru (1)

1) GenHotel, Univ-Evry, Université Paris-Saclay, 91025, Evry, France; 2) Division of Rheumatology, CHU de Québec, Department of Medicine, Québec, QC, Canada; 3) Centre National de Génotypage, Genomic Institute, CEA, Evry, France; 4) GenHotel-Auvergne, EA4679, Auvergne University, Genetic Department, CHU Clermont-Ferrand, Clermont-Ferrand, France.

Presenting author: Maëva Veyssiere. Email: maeva.veyssiere@univ-evry.fr

Keywords: Rheumatoid arthritis, Whole Exome Sequencing, Rare variants, Burdern test, Family-based association study

The genetic component of Rheumatoid Arthritis (RA) is not fully defined. Identification of rare variants through NGS analysis using multiplex pedigrees could help to characterize a part of the missing heritability.

To identify rare RA-associated variants (defined as allele frequency ≤1% or absent of databases), we analyzed exome sequences from 22 affected and 8 unaffected subjects, belonging to families with at least 4 RA cases and/or other autoimmune disease, together with 45 CEU subjects of the 1000 genomes project. After alignment, variant calling and quality filtering, we combined 2 approaches to detect RA-associated SNVs: a variant centered method identifying variants carried by all and only cases of a same family and, a gene based method using pVAAST software that computes a score based on linkage and association tests. Variants showing evidence of association with RA with at least one of the previous methods were sorted by their predicted effect, using SNPEff and CADD C score, to select the most deleterious ones. They were then validated by genotyping in extended pedigrees (65 cases and 45 unaffected subjects).

We focused our analysis on 154,592 high quality variants and selected 112 SNVs showing evidence of RA association with high potential severity effect for the validation stage: 51 family specific and 61 shared by several cases in one or more pedigrees. Among the genes affected by these variants, some are implicated in differentiation and migration of T-cells.

Poster n. 31

A Comparison Study between Filter Feature Selection Algorithms for
Protein sequences Classification

Naoual Guannoni (1), Faouzi Mhamdi (2), Mourad Elloumi (3)
1) Faculty of Sciences of Tunis (FST), Tunis El-Manar University, Tunisia; 2) Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE) Ensit; 3) Laboratory of Technologies of Information and Communication and Electrical
Engineering ENSIT, University of Tunis, Tunisia

Presenting author: Faouzi Mhamdi. Email: faouzi.mhamdi@ensi.rnu.tn
Keywords: Filter method, feature selection, classification, Protein sequence

Biological data is undergoing exponential growth in both the volume and complexity. Indeed, the selection of biological features is an important step that aims to reduce the curse of dimensionality to improve prediction performance in classification systems. In this paper we interested in protein sequence classification. We represent a comparative study between filter feature selection algorithms in order to identify relevant, not redundant features to improve the capacity of prediction. We use four classifiers to calculate the precision of classification. The results have shown the effectiveness of this work. The results obtained for the different feature selection methods are compared and discussed. The final aim of this study is to select the best filter selection method with the best classifier that enhances the accuracy of protein classification.