Choosing the Best Method for Gene Expression log- log Linear Models Using Multiple CART Trees

Authors

  • MartГ­n Castillo* University of Chile
  • Rodrigo Assar University of Chile

DOI:

https://doi.org/10.11145/443

Abstract

Microarray and RNA-Seq techniques are used to infer genes showing differential expressions on treatment conditions through the analysis of log-log linear models for the expression with treatment compared with control condition. Due to costs and technical limitations usually the experiments present small-sized samples and high contamination; therefore, choosing the estimation method for coefficients of such models becomes a challenge [1]. Herein, we simulate microarray and RNA-Seq experiments and analyze a log-log linear model with contaminations at both conditions, varying key features: the sample size n, contamination type (light-tailed or heavy-tailed), contamination proportion p, and error variance. For each features configuration we computed the accuracy at each method among least absolute deviations, ordinary least squares, and Huber M-Estimators. Using this information, we built a machine learning that, based on classification CART trees [2], automatically decides the best method depending on simple questions. ...

Author Biography

Rodrigo Assar, University of Chile

Professor Rodrigo Assar. Assistant Professor at the Human Genetics Program (2014), U. Chile. Mathematical Engineer (MSc U. of Chile, 2005), PhD in Computer Science (INRIA Bordeaux Sud-Ouest, U. Bordeaux 1, 2011).

Downloads

Published

2015-04-15

Issue

Section

Conference Contributions