Nationwide Children's The Ohio State University

Road to Collaboration:
Human Genetics and Genomics Community

Abstracts from Symposium I: 05/20/14

Presentation Abstracts

May 20, 2014
Symposium I Archives:
Presentation Slides
Poster Abstracts
Relationships of bitter taste phenotype, genotype, and oral nicotine replacement use
Karen Ahijevych, PhD, Beverly Tepper, PhD, Margaret Graham, PhD, Christopher Holloman, PhD, William Matcham, MS
Contact email:

Introduction: Recommended dosage of oral nicotine replacement therapy (NRT) product is often not achieved in smoking cessation attempts. Bitter taste phenotype (BTP) may be a potential risk factor for non-adherence to oral NRT products. There is limited literature on BTP in the context of smoking and none in relation to oral NRT pharmacotherapy. Aims were to: examine the effect of BTP, as determined by PROP taste methodology, on use of NRT products (nicotine inhaler and lozenge for 1 week each) in cigarette smokers during smoking abstinence; characterize the effect of BTP on sensory experiences of oral NRT products (nicotine inhaler and lozenge for 1 week each); and investigate differences in use of the two oral NRT products comparing continuous (lozenge) versus intermittent (inhaler) exposure by taster or non-taster of bitter phenotype. A secondary aim was to determine each individual's taste receptor gene polymorphisms and correlate with bitter taste phenotype. Methods: The effect of tasters and non-tasters of bitter on NRT usage and sensory response to products were examined. In a cross-over experimental design, 120 participants received a one week supply of nicotine inhalers and one week of nicotine lozenges with random assignment to order. Mixed effects linear model analyses were conducted. Results: BTP was not found to impact usage of or sensory response to NRT, after adjusting for other factors. Among non-taster participants, average daily number of lozenges was higher than average number of nicotine cartridges. Regardless of product type, participants who smoked menthol cigarettes at baseline provided an average strength of sensation score 2.47 points higher, on a 7 point scale, compared to smokers of nonmenthol cigarettes (p = .013). Half of baseline non-tasters shifted to taster phenotype two weeks after smoking cessation or reduction. Taste receptor genotype was related to bitter taste phenotype (Kendall tau = .591, p = .0001). Conclusions: While BTP was not significantly related to NRT usage, this may be associated with NRT under-dosing and limited variance in the outcome variable. Further research of this and other factors impacting NRT usage are warranted to effectively inform smoking cessation pharmacotherapy.

DNA Methylation as a Biomarker for Preeclampsia
Cindy M. Anderson, Jody L. Ralph, Michelle L. Wright, Bryan Linggi, Joyce E. Ohm
Contact email:

Background. Preeclampsia contributes significantly to pregnancy-associated morbidity and mortality as well as future risk for cardiovascular disease in mother and offspring and preeclampsia in offspring. The lack of reliable methods for early detection limits opportunities for prevention, diagnosis and timely treatment. Purpose. The purpose of this study was to explore distinct DNA methylation patterns associated with preeclampsia in both maternal cells and fetal-derived tissue that represent potential biomarkers to predict future preeclampsia and inheritance in children. Method. A convenience sample of nulliparous women (N = 55) in the first trimester of pregnancy was recruited for this prospective study. Genome-wide DNA methylation was quantified in first-trimester maternal peripheral white blood cells and placental chorionic tissue from normotensive women and those with preeclampsia (n = 6/group). Results. Late-onset preeclampsia developed in 12.7% of women. Significant differences in DNA methylation were identified in 207 individual CpG sites in maternal white blood cells collected in the first trimester (132 sites with gain and 75 sites with loss of methylation), which were common to approximately 75% of the differentially methylated CpG sites identified in chorionic tissue of fetal origin. Conclusions. This study is the first to identify maternal epigenetic targets and common targets in fetal-derived tissue that represent putative biomarkers for early detection and heritable risk of preeclampsia. Findings may pave the way for diagnosis of preeclampsia prior to its clinical presentation and acute damaging effects and the potential for prevention of the detrimental long-term sequelae.

Graphical representation of microbial community subpopulations using penalized Kendall's distance
Deepak Nag Ayyala, Shili Lin
Contact email:

When metagenomic count data are recorded by classifying gene sequences, there are several methods to visualize the data to uncover hidden community substructures. One such method is to construct a multidimensional scaling model using a measure of dissimilarity between observations. UniFrac is one such measure of distance that is very commonly used for metagenomic data. In this work, we construct a multidimensional scaling model to represent the data on a 3-dimensional coordinate system based on a novel penalized Kendall's # -distance to characterize dissimilarity between observations. We applied the proposed procedure to a human microbial community dataset composed of over 800 observations from multiple habitats and body sites. The constructed scaling model exhibits several features in the data set that are not seen in a UniFrac-based model. Clustering based on the model reveals several physiological similarities between the observations within each of the clusters.

Preliminary Data: Natural History Studies of Mucopolysaccharidosis III
Kristen Bain, M.D., Kevin Flanigan, M.D., Kim McBride, M.D.
Contact email:

Mucopolysaccharidosis (MPS) type III (Sanfilippo syndrome) is a group of four genetic diseases involving the accumulation of glycosaminoglycans (GAGs) in various tissues of the body. MPS III affects young children who previously appeared to be healthy. The most devastating effects of the condition result from abnormal accumulation of GAGs in the central nervous. This is known to result in developmental and cognitive plateau and decline, loss of motor function, musculoskeletal problems, and early death. At present there is no effective therapy. This study seeks to better define the natural history of MPS IIIA and MPS IIIB, in order to better gauge the effectiveness of therapy in future clinical trials. In our preliminary data on the first 10 patients (eight with MPS IIIA and two with MPS IIIB), we are beginning to note some age-related trends in physical, laboratory, imaging, functional, and neurocognitive measures.

Modeling Digenic Dilated Cardiomyopathy in Zebrafish
Kelly Banks, Ryan Huttinger, Brittany Cox, Duan Li, Ana Morales, Dale Hedges, Debra Wheeler and Ray Hershberger
Contact email:

Familial Dilated Cardiomyopathy (DCM) is a complex genetic disease with pronounced heterogeneity and variable penetrance and age of onset. Though typically modeled with a Mendelian paradigm, it seems likely that polygenic inheritance is more descriptive in some cases, and that this may occur more than previously appreciated. Whole exome sequencing in one family identified two segregating, high-priority variants in TNPO3 and 'Gene 2', a previously unreported gene. Functional studies of these variants in zebrafish larva using morpholino knockdown and mutant cRNA overexpression supported the role of these variants in DCM. Compared to wildtype TNPO3 cRNA, mutant TNPO3 cRNA overexpression caused a significant decrease in fractional area change. Knockdown of both orthologs of Gene #2 caused a large reduction in cardiac contractility whereas knockdown of each individually had little effect. Finally, despite no effect of knocking down a single ortholog of Gene #2, knockdown of TNPO3 and Gene 2 together caused a larger decrease in Fractional Area Change and effected a larger proportion of larva than TNPO3 knockdown alone. We conclude that both variants are likely causative in this family, verifying the possibility of digenic inheritance in DCM.

Regulatory polymorphisms in DBH affect peripheral gene expression and associate with MI
Elizabeth S. Barrie, David Weinshenker, Sarah Pendergrass, Leslie A. Lange, Marilyn Ritchie, James G. Wilson, Joseph F. Cubells, and Wolfgang Sadee
Contact email:

Dopamine beta hydroxylase (DBH) synthesizes norepinephrine, a critical neurotransmitter both centrally and peripherally. Previous studies have associated DBH variants with a spectrum of clinical phenotypes, demonstrating substantially lower serum DBH levels in carriers of the promoter variant rs1611115, by uncertain mechanisms. In the present study, analysis of mRNA in human tissues confirmed high levels in the brain (locus coeruleus) and adrenals, but also unexpectedly in sympathetically innervated organs (liver>lung>heart), suggesting DBH mRNA may be transported to local nerve terminals. Allele-specific expression of mRNA revealed small ( < twofold) effects in LC and adrenals, but pronounced allelic differences in the liver (2-11 fold), indicating the presence of multiple regulatory variants. Scanning of the DBH locus identified two variants, promoter rs1611115 (MAF 21%) and splice site rs1108580 (MAF 46%, in partial LD with rs161115), associated with significantly reduced DBH mRNA expression in liver and lung, but not brain and adrenals. In the Jackson Laboratories' database of in-bred mouse strains, DBH mRNA levels in the liver correlated positively with cardiovascular risk phenotypes, expected from increased DBH and hence norepinephrine activity. As the molecular genetic effects were observed in unexpected tissue types, a PheWAS analysis was performed to determine other phenotypes affected by these SNPs. Testing the combined effect of the minor alleles of rs161115 and rs1108580 suggested protection against angina pectoris and myocardial infarction. We replicated these findings in the Marshfield EMR dataset. Using the interaction model, we again observed protective effects of these SNPs against myocardial infarction and angina pectoris. Finally, in an African-American cohort, the Jackson Heart Study, there was a significant association between MI and the two SNPs when considered together. Applying molecular genetics in human tissue, use of mouse databases, and clinical association studies, we find evidence for frequent DBH variants modulating risk for cardiovascular disease and affecting sympathetic activity. Supported by NIGMS U01092655.

High-throughput Sequencing and Bioinformatic Analysis in Familial Congenital Heart Disease
Donald Corsmeier, Sara Fitzgerald-Butt, Gloria Zender, Vidu Garg, Kim McBride, Peter White
Contact email:

Congenital heart disease (CHD) is the most common type of birth defect, appearing in eight of one thousand live births. CHDs are a leading cause of death in infants and represent an increasing cause of adult morbidity and mortality. There is strong evidence for a large genetic component in the etiology of non-syndromic CHDs. As such, whole exome and whole genome sequencing will be invaluable in elucidating the underlying causal variants which lead to disease. In collaboration with investigators from the Center for Cardiovascular and Pulmonary Research, 95 samples from 32 families have been sequenced to date using the Illumina HiSeq platform. Bioinformatic analysis was performed using Churchill, the pipeline developed in the White lab for the discovery of human genetic variation. Churchill utilizes the Burrows Wheeler Aligner (BWA) and the Genome Analysis ToolKit (GATK) to perform sequence alignment and variant calling. For downstream filtering and prioritization of variants, we used traditional heuristic methods based on existing variant databases (dbSNP, 1000 Genomes, NHLBI Exome Sequencing Project, HGMD). Additionally, we explored other widely-used methods, including the Variant Annotation, Analysis & Search tool (VAAST), a probabilistic search tool for identifying damaged genes and disease-causing variants. These different methods yield highly consistent result sets from which we have been able to identify several putative causal variants in these families and mark them for further validation and functional studies. These findings will provide a better understanding in the pathogenesis of CHD and can lead to improved diagnosis and novel preventive and therapeutic approaches.

James Fitch, Benjamin Kelly, Peter White
Contact email:

Churchill is a software pipeline designed to analyze the extremely large data sets that are being produced by current next-generation sequencing technologies. By using novel parallelization techniques Churchill is able to effectively utilize distributed clusters of computers to allow the processing of data sets that would have otherwise been too large to process at a reasonable cost and in a practical timeframe. To demonstrate Churchill's processing speed and ability to run in "the cloud", we analyzed the whole genome sequences of 1,088 samples available from the 1000 Genomes Project on a cluster of Amazon Elastic Compute Cloud (EC2) instances. This produced a set of variant locations in the human genome where each of the samples differed from the human reference genome. This enabled generation of a Churchill-specific variant frequency dataset, which can aid in the identification of rare disease-causing variants in future exome and whole genome sequencing studies. To perform this processing, an Amazon Machine Image (AMI) was created that included both the Churchill software as well as other necessary supporting software components. Additionally, a set of utility scripts was developed to help automate the processing steps: virtual machine creation and initialization, copying the raw sequencing data from Amazon's Simple Storage Service to EC2, running Churchill, and copying the results to persistent storage. Despite the large number of samples, a final list of variants was produced in less than 7 days using 400 EC2 instances for ~$11 per genome. This final variant list was then compared to a similar list that had been produced by the 1000 Genomes Project Consortium, and the two variant sets were found to have a high degree of correlation. The ability to produce valid results quickly and cost effectively, even when processing a large set of samples, demonstrates the superior computational approach of the Churchill pipeline.

Offspring Risk Perceptions: Adolescents and Young Adults with Congenital Heart Disease Agree with their Parents (and both are wrong!)
Sara Fitzgerald-Butt, Kevin Fry, Ali Zaidi, Vidu Garg, Cynthia Gerhardt, Kim McBride
Contact email:

As individuals with congenital heart disease (CHD) now typically survive into adulthood, the risk of CHD in their offspring is a pertinent concern. Parents are the typical source of medical information through childhood but adolescents and young adults (AYA) must have sufficient knowledge to assume responsibility for their medical care. Therefore, we examined offspring risk perceptions using both categorical (below average, average, above average) and continuous (0-100%) measures, as well as their associations with demographic characteristics, knowledge of heart defect name, CHD diagnosis complexity, CHD perceived severity and general genetic knowledge in AYA and their parents. Participants included 196 AYA, 15-25 years old with structural CHD (mean age=19.0 years, 54% male, 85% Caucasian) and 179 parents (mean age=47 years, 34% male 94% Caucasian) who were recruited from an outpatient cardiology clinic (85% AYA consent rate). All participants were asked the name of their/their child's CHD and completed measures of demographics, genetic knowledge, perceived CHD severity, and perceived risk for an offspring with CHD. CHD complexity was rated as simple, moderate, or great. Categorical perceptions of risk were similar in AYA and parents with only approximately a third of participants endorsing higher risk (34% AYA, 35% parents) while two-thirds endorsed average or lower risk. The perception of being in the high risk category was associated with higher genetic knowledge (p<.001, p=.018) and higher perceived CHD severity in a hypothetical baby (p=.039, p=.001) in both AYA and parents, while Caucasian race (p=.008) and higher median household income (p=.001) were associated in just the AYA and perceived future heart defect severity (p=.016) was associated in just the parents. The ratings of risk on a continuous scale were extremely variable and remarkably high among both the AYA and parents, ranging from 0-100% with a mean of 36.8% (SD=24.3%) and 34.1% (SD=23.5%), respectively, with a mode of 50% and increased selection of 25% and 75% in both groups. The majority of AYA and parents have an inaccurate categorical risk perception and their continuous risk perception may represent poor numeracy skills and either lack of or inaccurate knowledge of risk. These results highlight the need to provide accurate offspring risk information to both AYA and parents, possibly while also providing additional genetic and numeracy education.

Interacting Susceptibility Genes for Colorectal Cancer
Madelyn M. Gerber, Matthew C. Cianciolo, Amanda E. Toland
Contact email:

Colorectal cancer (CRC) causes nearly 50,000 deaths in the United States each year and is the third leading cause of cancer-related deaths. The identification of genetic risk factors underlying CRC will have immense value as a tool to identify individuals with increased predisposition to this cancer and to highlight new potential therapeutic targets. In the mouse, the gene Ptprj maps to a region of the genome linked to colon cancer susceptibility in mice. The Ptprj locus interacts synergistically with two other CRC susceptibility loci mapped in the mouse (Scc5 and Scc13) to further increase CRC risk. The Scc5 locus also independently interacts in a reciprocal manner with Scc4 to augment susceptibility. Importantly, the susceptibility genes at Scc4, Scc5, and Scc13 have not yet been identified. Our goals are (1) to identify which of the multiple genes present at these Scc loci are responsible for modifying CRC risk, and (2) to understand how the combination of the susceptibility genes at these regions increases CRC risk. To achieve these goals, RNA-Seq was used to identify genes with genetic variants or expression levels that differ between the CRC-resistant and CRC-susceptible mouse strains used to map the Scc loci. Next, SNPs in the human orthologs of these candidate susceptibility genes were tested in 194 pairs of normal and colon tumor DNA (from human CRC patients) for evidence of allele-specific gains or losses. SNPs that showed statistical trends of gains or losses were genotyped in 296 additional DNA pairs. These studies revealed that 75 of 950 tested SNPs reproducibly show preferential gains or losses in colon tumor DNA samples compared to matched healthy colon DNA samples. From these studies, Epas1 (Scc4), Csnk1a1 (Scc5), and Prdm5 (Scc13) emerged as leading candidates at the loci of interest. Preliminary studies to uncover interactions among these candidate genes using in vitro models have thus far shown that the transcription factor Epas1 may regulate expression of Csnk1a1 and Ptprj, while the epigenetic regulator Prdm5 may also exert regulatory effects at these loci. Furthermore, Epas1 may promote Wnt signaling in colon epithelial cells and therefore function in an oncogenic capacity. Future work will delve deeper into the exploration of this complex network of gene-gene interactions by manipulating levels and isoforms of the candidate genes in cell lines and assessing effects on cell growth, death, and other cancer-relevant phenotypes.

Collection and Extraction of Saliva DNA for Next Generation Sequencing
Michael R. Goode, Soo Yeon Cheong, Ning Li, William C. Ray, Christopher W. Bartlett
Contact email:

The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing.

Exome Sequencing of 419 individuals with Dilated Cardiomyopathy: A Complex Picture Emerging from a 'simple' a Mendelian Disorder
Dale Hedges, Ana Morales, Daniel Kinnamon, Debra Wheeler, Ray Hershberger
Contact email:

Dilated Cardiomyopathy (DCM) is characterized by a weakening of the left ventricular heart muscle, accompanied by an enlargement of the ventricle. Recent estimates place the incidence of DCM at approximately 1/250 individuals. Although the phenotype may present as a secondary consequence of other conditions and exposures, including tachycardia, myocardial infarction, and chemotherapy, DCM also has a substantial genetic component, with over 40 genes implicated to date. Rare variants within these implicated genes are thought to account for roughly 40-50% of hereditary DCM. In an effort to both reveal additional genetic risk loci for DCM and gain insight into the relative contribution of established genes, we conducted whole exome sequencing of 419 DCM cases (284 independent probands), including both familial and non-familial cases. In the first phase of our analysis, we are closely examining the protein coding variants falling within a set of 40 previously implicated DCM genes. Variants are being evaluated using multiple strategies to prioritize those most likely to be pathogenic. Prioritized candidates are then subjected to a detailed clinical adjudication process to assess the likelihood of pathogenicity. 130-175 prioritized variants were detected across the 40 genes, depending on the specific prioritization strategy. Consistent with previous reports, we observe a substantial overlap in variant distribution among familial and non-familial cases. Our results provide additional evidence that a subset of DCM cases harbor multiple pathogenic variants across more than one DCM gene, suggesting a more complex genetic architecture could be at work in some individuals. A total of 97 cases harbored hits in more than one surveyed DCM gene. Using a carefully selected control exome set from NHLBI's exome sequencing project, we are currently evaluating whether the observed variant distribution can be reconciled with the prevailing one-hit autosomal dominant model for DCM genetics.

KELVIN: A tool for modeling genetic architecture for complex disorders
Yungui Huang, Sang-Cheol Seok, William H. Valentine-Cooper, John M. Burian, Veronica J. Vieland
Contact email:

KELVIN is a comprehensive statistical genetics software package for measurement of evidence under the PPL framework, a unique statistical framework developed by the Vieland lab. It currently supports more than 20 different types of analyses, including marker-to-marker or trait-to-marker analysis; dichotomous trait, quantitative trait, and QT threshold models; linkage and trait-marker linkage disequilibrium (association) analysis; and gene x gene interaction models. KELVIN supports both Lander-Green (LG) and Elston-Stewart (ES) exact likelihood calculation algorithms. MCMC sampling of inheritance vectors has also been adopted by KELVIN to make likelihood calculation on very large pedigrees possible. KELVIN provides unique flexibility allowing simultaneous analysis of different data structures, from case-control data to extended pedigrees, and allowing different marker maps and/or pedigree peeling algorithms to be used for different pedigrees within a single analysis (such as hybrid of LG and MCMC). Analyses can be done on a single node or can be distributed across a cluster. For high throughput, KELVIN employs client-server architecture together with a database to facilitate distributed computing. Likelihood clients decide on trait space, request results through database, assembles results and eventually produce the PPL like statistics, while likelihood servers compute likelihood on the requested trait models using the desired algorithm. This software engineering technique allows considerable flexibility, so different algorithms can be accommodated and dynamic allocation and de-allocation of servers can be achieved. KELVIN is continuously being updated and polished as a tool for modeling genetic architecture for complex disorders.

From Single Sample Clinical Analysis to Population Genomics, Churchill is an Ultra-Fast Computational Approach to Human Variant Discovery No Matter the Scale
Ben Kelly, James Fitch, Don Corsmeier, David Newsom, and Peter White
Contact email:

Next generation sequencing (NGS) has revolutionized genetic research, empowering dramatic increases in the discovery of new functional variants. The technology has been widely utilized by the research community and is now seeing rapid adoption clinically, driven by recognition of NGS's diagnostic utility and enhancements in quality and speed of data acquisition. Compounded by declining sequencing costs, this exponential growth in data generation has created a bioinformatics bottleneck. Churchill is a computational approach that overcomes these challenges, fully automating the analytical process required to take raw sequencing instrument output through the complex and computationally intensive processes of alignment, post-alignment processing, local realignment, recalibration and variant calling. Through implementation of novel parallelization techniques we have dramatically reduced the analysis time for whole human genome resequencing from weeks to hours, without the need for specialized analysis equipment or supercomputers. As increasing numbers of molecular diagnostic laboratories implement NGS in clinical settings, Churchill provides a solution to the data analysis challenges these laboratories will immediately face. Compared with alternative analysis pipelines, Churchill is simpler, faster, deterministic and capable of running on all popular Linux environments. Furthermore, Churchill optimizes utilization of available compute resources and scales in a near linear fashion, enabling complete human genome resequencing analysis in ten hours with a single server, three hours with our in-house cluster and under two hours using a larger HPC cluster or the cloud. This level of speedup means that given an appropriate amount of computational resources, Churchill is capable of population-scale analysis in a fraction of the time. Churchill eliminates the NGS bioinformatics overhead and is a prime candidate to overcome the bottleneck even faster sequencing will create.

Valid permutation tests for genetic case-control studies with missing genotypes
Daniel D. Kinnamon, Eden R. Martin
Contact email:

Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. We developed a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We began by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we established that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We applied these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also showed that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.

Novel eQTLs Associated with CNTNAP2 in Diverse Cognitive Processes
Ning Li, Samuel L. Wolock, Stepen A. Petrill, Judy F. Flax, Anne S. Bassett, Linda M. Brzustowicz, Christopher W. Bartlett
Contact email:

Background: Language is a unique human trait. Learning and improving speech and language ability is crucial for children to communicate with others and contribute to their future academic and social life. CNTNAP2 has been previously associated with different disorders including Specific Language Impairment (SLI) and autism, as well as with normal language development. Here we study the apparent pleiotropy of CNTNAP2 in a collection of families multiplex for SLI. Studies to date have focused on SNPs across the CNTNAP2 without regard to functional potential. Methods: 1) 454 individual samples from 21 families including were genotyped in 3 illumina genotyping platforms separately. After quality control and imputation, genotypes around CNTNAP2 were used for a family based linkage and association study using multiple languages traits. 2). 1177 individuals from five publicly available brain eQTL datasets were employed for CNTNAP2 eQTL analysis. 3) 106 formalin fixed paraffin embed (FFPE) brain sample were collected from NCH and OSU. Real-time PCR and genotyping were performed on top CNTNAP2 SNPs from discovery to replicate in eQTL analysis. Results: 1) Linkage evidence was found around 146.49M under multiple phenotypes(lang_ss, f_explang, f_rapid) 2) 2 out of the top 5 eQTLs(rs76487286 and rs2727626) in CNTNAP2 were found to be consistent across by multiple brain expression probes/datasets (PPLD=49-67%), and were overlapped with linkage peak. 3) After QC , 45 FFPE human brain samples were used in replicated the top hits from the previous findings. rs4725767(P= 0.040775) is a eQTL in the in-house brain samples. Discussion: This is study used 3 different approaches, SLI pedigree linkage and association study. Brain database eQTL meta-analysis and FFPE post-mortem brains samples eQTL analysis, provide the evidence that CNTNAP2 may take part in multiples phases of language aspect as well as in other brain tasks. SNPs identified in this study represent logical targets for future studies of language related traits.

Linkage Analysis of Diabetes Microvascular Complications Reveals Strong Evidence of Loci for Retinopathy and Nephropathy
Lipner EM, Tomer Y, Noble JA, Monti MC, Lonsdale JT, Corso B, Greenberg DA
Contact email:

Using data from 415 families, we conducted linkage analyses aimed at identifying susceptibility loci for microvascular complications of type 1 diabetes (T1D). We used 402 SNP markers spanning chromosome 6. We focused on chr. 6 because our previous studies showed evidence of a chr. 6 locus for complications. The phenotypes for the analysis were retinopathy and nephropathy. (All subjects had T1D.) Analysis of the retinopathy phenotype yielded two linkage peaks: one, as expected, was at the HLA region but another, novel locus appeared telomeric to HLA (Hlod=5.0). The nephropathy phenotype analysis showed a peak centromeric to HLA (lod=2.0) but the retinopathy linkage peak (telomeric to HLA) was absent in the nephropathy analysis. Because of the association of T1D with DRB1*03:01 and DRB1*04:01, we stratified our analyses based on families whose probands were positive for DRB1*03:01 or DRB1*04:01. When analyzing the DRB1*03:01-positive retinopathy families, in addition to the novel locus telomeric to HLA, one centromeric to HLA emerged at the same location as the nephropathy peak (Hlod=3.4). When we stratified on DRB1*04:01-positive families, the HLA telomeric linkage peak remained (Hlod=4.1) but the centromeric peak disappeared. Our findings showed that non-HLA loci on chromosome 6 are involved in the expression of T1D complications. In addition, the stratification results indicate that specific HLA alleles interact with those loci to influence the expression of complications.

Processing Speed, Attention, and Reading: Generalist Genes or Distinct Pathways of Risk?
Sarah L. Lukowski, Stephen A. Petrill
Contact email:

Poor attention and slower processing speeds are correlated with reading difficulties. In addition, behavioral genetic studies have suggested processing speed, attention, and reading are all highly heritable, with additional significant non-shared environmental influences. The present study extended this work to examine whether processing speed, attention, and reading have shared genetic and environmental influences or if processing speed and attention have independent contributions to the genetic and environmental etiology of reading. Participants from the Western Reserve Reading and Math Project, N=105 identical and N=147 same-sex fraternal twin pairs, M=12.20 years of age, completed home visits that included standardized measures of reading from the Woodcock Reading Mastery Tests. In addition, children completed WISC Symbol Search to measure processing speed and mothers provided ratings of attention (SWAN). Multivariate estimates indicated that processing speed shared significant additive genetic influences with attention and all reading achievement measures. Furthermore, attention had no significant genetic overlap with reading beyond that which is shared with processing speed. Non-shared environmental influences on attention overlapped with decoding, whereas non-shared influences on processing speed were shared with reading comprehension. The results of the present study suggest that generalist genes underlie processing speed, attention, and reading achievement. Furthermore, significant non-shared environmental overlap between attention and decoding as well as processing speed and reading comprehension may suggest shared child-specific influences within these domains.

Testing Regulatory Variants Affecting Translation by Measuring Allelic mRNA Levels on Polyribosomes: OPRM1, NAT1, HTR2A, and ABCB1
Roshan Mascarenhas, Ryan M. Smith, Amy Webb, Danxin Wang, Audrey C. Papp, Julia K. Pinsonneault, Wolfgang Sadee
Contact email:

mRNA translation into protein is a main target of regulatory processes, but the role of genetic variants in mRNA transcripts has yet to be systematically studied. Here we evaluate a method to assess the effect of polymorphisms on mRNA interactions with actively translating ribosomes. The approach relies on differential loading and progression of mRNA alleles in purified polyribosome fractions, expected to alter allelic mRNA ratios compared to ratios in the cytosol. Equal mixtures of plasmid vectors expressing wild-type and variant alleles of OPRM1 (rs1799971, 118A>G), HTR2A (rs6311, -1438G>A), ABCB1 (rs1045642, 3435C>T) were transfected into various cells, while NAT1 (rs1057126, *4>*10) and ABCB1 (3435C>T) were analyzed in heterozygous lymphoblastoid cells expressing these genes natively. Consistent with previous results suggesting altered translation, the minor allele 118G of OPRM1 reduced, whereas the minor allele *10 in NAT1 enhanced, polysome occupancy. Also as expected, no change was observed with HTR2A (-1438G>A). For ABCB1 (MDR1), rare codon usage involving the 3435T allele had been proposed to affect rate of translation; however, no allelic difference was observed in polysomes compared to cytosol (other than allelic ratios resulting from differences in RNA turnover demonstrated earlier by our group). These results provide a novel approach in an underexplored area of expression genetics. Comparison of complete RNAseq profiling of total and polysomal fractions in lymphoblastoid cells provide insights into transcriptome-wide differential distribution of isoforms between these fractions. Funded by U01 GM092655.

Exome Sequencing Reveals Possible Role of SCNN1D in Syndrome of Heart Defects, Intellectual Disability, Severe Speech Delay and Brachydactyly.
McBride KL, Nunez C, Soldatova L, Zender G, Fitzgerald-Butt SM, Corsmeier D, Askwith C, Kelly L, El-Hodiri H, White P.
Contact email:

Background: Congenital heart defects (CHD) are common, but the specific etiology for most remains unknown. We sought the genetic cause of CHD in a consanguineous family using whole exome sequencing in which previous clinical genetic testing (including oligoarray CGH) was normal. Methods: A Kurdish Iraqi family of six children born to a first cousin union was enrolled. Whole exome sequencing was performed on all eight individuals. Filtering was performed using homozygosity mapping and predicted functional consequences of any coding change. Functional studies of sodium gating in Xenopus oocytes was performed with various combinations of the subunits (alpha, beta, gamma, and wild type or variant delta), and electrophysiologic measurements of sodium flux were obtained. RNA probes were made to assess expression profiles in the developing Xenopus tadpoles. An additional 96 individuals with isolated CoA and had Sanger sequencing to identify any pathogenic variants in our gene of interest. Results: Three children were affected with VSD, CoA, and intellectual disability (with severe speech deficit), and dysmorphic features including mild coarsening of facial features with square face and brachydactyly (primarily fifth digit). Three novel homozygous variants were identified in all affected, in intronic locations or in a pesudogene. One rare variant in SCNN1D (g.604C>T; p.R202W; frequency = 0.0001 in the Exome Sequencing Project) coding for the delta subunit of the epithelial sodium channel (ENaC) was homozygous in all affected. This change is predicted to be pathogenic in PolyPhen2 and SIFT. Studies in the developing tadpole showed expression in the developing heart and brain, with diffuse expression in somites. Gating studies of the ENaC demonstrated that the delta variant reached full potential, but had slower flux compared to wild type. Three novel non-synonymous changes and novel two splice site variants were identified in the isolated COA and VSD cohort. Conclusion: A rare variant in SCNN1D was identified in a family with syndromic heart defects. Evidence is suggestive that it may play a role in the pathogenesis in this family, but will require additional study for confirmation.

Human Organoids for Cancer, Neurodevelopmental and Neurodegenerative Disorders Research
Susan McKay and Rene Anand
Contact email:

Induced pluripotent stem cells (iPSC), derived from patient blood or skin cells, are useful in vitro models to study genetic, molecular, and cellular abnormalities associated with human neurological disorders. These iPSCs can be reprogrammed and cultured in vitro using specific morphogens, growth factors, and growth conditions to represent many complex cells types of the human body. The underlying premise of our hypothesis is that extrinsic signals that regulate patterning, fate determination and maturation can be used to sculpt the known self-organizing capacity of human pluripotent stem cell-derived ectoderm, endoderm and mesoderm into discrete "organoids". Our laboratory is specifically interested in differentiating these cells into more complex tissue types representing specific regions of the human brain, lung, vasculature and gut. These organoids represent the next generation of stem-cell derived tools for understanding mechanisms of evolution, human disease, stem cell therapies, tissue engineering, toxicological screens, as well as drug discovery. We intend to use these tools for biomedical-related research on cancer, neurodevelopmental and neurodegenerative disorders. We are using various experimental strategies for culturing cells, such as using specific morphogens and growth factors known to promote development of different human embryonic tissue types. In addition, we are culturing these cells on 3D scaffolds and in spinning flask bioreactors. These conditions are expected to improve the circulation and exchange of cell culture nutrients to promote the generation of more complex and denser tissues that better represent the regions in the human body. SUPPORT: We gratefully acknowledge support for pilot funds generously provided to initiate stem cell research in the Anand laboratory by the Wexner Medical Center Fund and the Ingram Fund for Autism Research

Filtering of Whole Exome Sequencing Data from Siblings based on Identity-By-Descent Sharing to Detect Autosomal Recessive Genetic Variants
Mari Mori, Don Corsmeier, Peter Whilte, Kim McBride
Contact email:

The majority of mutations responsible for Mendelian disorders are located in exons or canonical splice sites. All exons - the exome - comprise ~1% of genome in humans. Therefore, whole-exome sequencing (WES), a next-generation sequencing targeted capture methodology, is a cost effective and efficient method of testing for genetically heterogeneous conditions such as developmental delay, intellectual disability and epilepsy. The test is becoming a standard diagnostic test in the genetics clinic, with a 10-30% rate of successfully identifying a clinically relevant variant. It has also lead to discoveries of novel gene mutations when multiple affected patients and/or parents are available for testing. Current WES data processing pipelines typically generate 100-300K variants after filtering out low quality calls. Tertiary analysis is the process by which these long variant lists are reduced to a much shorter target list of potentially pathogenic variants. However, the challenges of tertiary analysis of WES include false positive calls and numerous variants predicted to be functionally pathogenic. Pedigree and expected Mendelian inheritance models can limit the gene search space. When a homozygous or compound heterozygous inheritance model is expected in two or more relatives, identifying blocks of chromosomes inherited from a shared ancestor pair (identity-by-descent segment), inferred from WES genotypes data, is a powerful filtering tool. Several open source and commercial software tools - GERMLINE, PLINK, IBD2, Golden Helix- have been developed for this purpose with variable accuracy. WES was performed on a sib-pair with epilepsy and intellectual disability. Pathogenic mutations in genes known to be associated with epilepsy were not detected. The aim of this current study is to assess the performance difference in the tools, apply them to WES data, and attempt to identify a potentially pathogenic variant common to the siblings.

RNA shape as an indicator of pathogenetics when the disease-causing variant is intronic
Abigail Neininger, David A. Greenberg
Contact email:

One of the critical problems in discovering disease genes is determining what makes an allele pathologic. Mutations in exons are easily noted but most of the genome is not exonic. What constitutes a pathologic intronic change is usually mysterious. The highly variable nature of intronic sequences in the population often precludes using sequence alone as a way to find disease related variants. BRD2 is a gene on chromosome six that has been linked and associated with juvenile myoclonic epilepsy (JME), the most common form of adolescent-onset epilepsy. However, the gene changes that cause the seizures are not exonic and the intronic changes have not been identified. However, a hyper-variable region has been associated with the disease. This region occurs in the intron that follows a highly conserved, alternatively spliced exon. We ask the question: Is the shape of the spliced variable intron correlated with JME? We used a 3D printer to produce energy-minimized models of the region to compare the shapes, sequences, and Gibbs free energy of 23 JME patients and 24 controls. We have found significant differences between patients and controls, specifically regarding homozygosity of the shape of the intron. This can help us find which sequences are causative of JME.

Transcriptome Sequencing: What's Your Question?
Audrey Papp, Amanda Curtis, Leslie Newman, Gloria Smith, Wolfgang Sadee
Contact email:

The Pharmacogenomics Core Laboratory at Ohio State University performs next generation sequencing of full length transcripts, small, micro, and non-coding RNA on multiple platforms. We use RNA allelic expression analysis to determine cis-acting quantitative expression loci in a variety of tissues. This combination of evidence informs analysis of gene-gene networks and reveals the interplay of many different regulatory processes. In our research, the ultimate goal of these analyses is to identify factors affecting neurocognitive and cardiovascular functions in wellness and disease, and to develop useable biomarkers for informed decisions. Study of the RNA transcriptome provides a window into the larger effects and smaller nuances of cell function and regulation. Next generation RNA sequencing permits quantitation of message expression, identification of novel splice isoforms, information about regulatory coding and non-coding RNA's, and measurement of micro-RNA's, plus nucleotide sequence of variants indicating or potentially affecting all of these aspects of RNA function. However, with RNA sequencing, more than any other sequencing technique, the optimal approach, procedures and practice vary tremendously depending on what biological questions are being asked. Challenges abound, from experimental design through data analysis. RNA isolation techniques, tissue availability, quality and input, library preparation methods, depth of sequencing, and sequencing platform are all variables to be considered. Nevertheless, knowledge is power. With appropriate design and analysis, next generation RNA sequencing provides an effective means to answer many questions, including causes of disease and potential targets for treatment. The Pharmacogenomics Core Laboratory in The Ohio State University College of Medicine Center for Pharmacogenomics provides a full spectrum of next generation sequencing and molecular biology services and expertise. We welcome collaborations. Research funding: NIH Pharmacogenomics Research Network U-01 GM092655

Jincheol Park, Shili Lin
Contact email:

The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distal enhancers and repressors, not merely by regulatory elements in its promoter. A molecular technique, 3C (chromosome conformation capture) (Dekker et al. 2002), that uses formaldehyde cross-linking and locus-specific PCR, was shown to be able to detect physical contacts between distant genomic loci, validating the theory that communications between distal elements are achieved through spatial organization (looping) of chromosomes that brings genes and their regulatory elements into close proximity. Such a molecular technique, coupled with the Next Generation Sequencing (NGS) technology, enables genome-wide detection of physical contacts between distant genomic loci. In particular, ChIA-PET (Fullwood et al. 2009) and Hi-C (Lieberman-Aiden et al. 2009) are NGS-aided technologies for the study of genome-wide spatial interactions. The availability of such data makes it possible to reconstruct the underlying three-dimensional (3D) spatial chromatin structure. In this talk, I will first describe a Bayesian statistical model, BASIC, for building spatial estrogen receptor regulation map focusing on reducing false positive interactions. A random effect model, PRAM, will then be presented to make inference on the locations of genomic loci in a 3D Euclidean space. The main feature of BASIC and PRAM that separates them from previous methods is that they take correlations among contact counts between chromatin into consideration, thereby achieving greater consistency with observed data. Application of PRAM to Hi-C data illustrates its accurate prediction of 3D structure when evaluated by physical distances measured in a Fluorescence In Situ Hybridization (FISH) validation experiment. Results from ChIA-PET and Hi-C data will also be visualized to illustrate the regulation and spatial proximity of genomic loci that are far apart in their linear chromosomal locations.

An integrative system for analyzing short genetic variants and their effects on pre-RNA splicing in human.
Marcin Pawlowski, Veronica Vieland, David A. Greenberg, Andrzej Kloczkowski
Contact email:

Recent studies suggest that, in the human genome, mostly of multi-exon genes undergo alternative splicing, generating multiple mRNA isoforms. The splicing of pre-mRNA is a highly regulated process involving, among others, the interactions of splicing factors with regulatory motifs, which can be located both in exons and introns of the pre-mRNA to be spliced. Numerous studies have shown that short genetic variants: SNPs, small-scale multi-base deletions or insertions, and microsatellite repeats, besides of being synonymous ( do not change the protein sequence) in particular cases, can still affect the pre-mRNA splicing, and as a result short genetic variants are often the cause of human diseases. The mechanism of splicing abnormalities is linked to destroying or/and creating motifs in pre-mRNA sequences that are recognized by splicing factors. Moreover, genetic variants can also modify the pre-mRNA secondary structure which can affect the proper display of target pre-RNA sequences and thereby change the splicing efficiency. Due to the tremendous number of short genetic variants on the human genome, estimated at over 10 million, there is a need for bioinformatics tools to prioritize, aggregate and visualize those genetic variants. In this work we present a web service to investigate potential effect of short genetic variants on the splicing of a given gene. We thus assess the possible effects of genetic variants with respect to the: (1) possible changes in pre-mRNA secondary structure and (2) creating or deleting motifs that are responsible for binding splicing factors (splicing enhancers or silencers). The information is aggregated and visualized to the user via a user-friendly and interactive interface. Moreover, our bioinformatics tool was designed to compare multiple RNA sequences for the same gene. Thus, we believe that the tool will be very useful for researchers willing to compare the RNA sequences of interest obtained from many people.

An intronic SNP in the estrogen receptor gene ESR1 is associated with mRNA expression in the brain
Julia K. Pinsonneault, Ben Kompa, Roshan Macarenas and Wolfgang Sadee
Contact email:

The gene encoding estrogen receptor alpha (ESR1) is exceedingly complex spanning >450 kb. We have identified frequent allelic expression imbalance (AEI) in the ESR1 gene in multiple human tissues using 3 polymorphisms in the transcribed portion of the ESR1 gene. In human brain from a small cohort of subjects with bipolar disorder, schizophrenia and controls, some of the AEI was substantial suggesting the presence of multiple factors affecting transcription or mRNA processing. SNP scanning and targeted sequencing revealed a single ESR1 SNP, significantly associated with large AEI ratios. The SNP is also associated with bipolar disorder in the same cohort. We have conducted an analysis of the effect of the variant in several large scale clinical studies detecting an association with bipolar disorder in female subjects. Functional assays in cell culture suggest that the minor allele of this variant may lead to a modest 1.4 fold increase in gene expression.

Pharmacogenomics: Assessing Disease Risk and Treatment Outcomes
Wolfgang Sadee
Contact email:

Genomics has emerged as an integral part and driver of personalized health care, disease prevention, and therapy. Advances in biomedical technologies, ultra-rapid DNA sequencing among them, reveal ever more details of human genomics; however, known genetic variants fail to account for a large portion of the estimated heritability. Also, drug therapy remains encumbered by substantial adverse reactions and insufficient efficacy. Pharmacogenetics studies have revealed variants that serve as clinical biomarkers guiding therapy; yet, predictive value often remains uncertain and implementation in clinical practice is slow. To define predictive molecular parameters with compelling clinical utility, the College of Medicine Center for Pharmacogenomics focuses on discovery of frequent variants that affect gene expression, RNA biology, and translation, likely the main processes guiding evolutionary patterns. We test the hypothesis that frequent regulatory variants under evolutionary selection pressure have strong phenotypic impact, altering response to therapy. Gene networks derived from RNA expression profiles further guide selection of candidate genes that are then tested in GWAS datasets. Successful biomarker panels will require an understanding of dynamic gene-gene-environment interactions; such multi-factorial tests are still the exception. Examples will be presented where these concepts have led to discovery of biomarkers with probable clinical utility. Supported by NIH grant U01 GM092655.

Regulation of gene expression at the carboxylesterase 1A1 gene locus in human liver.
Jonathan C. Sanford, Haojie Zhu, Danxin Wang, Wolfgang Sadee
Contact email:

Regulatory variants, less well characterized than coding polymorphisms, have the potential to serve as clinically relevant biomarkers for determining drug efficacy or toxicity. This studies defines regulatory variants in the gene carboxylesterase 1A1 (CES1A1), an enzyme which is responsible for the metabolism of numerous exogenous and endogenous compounds. Drugs metabolized by CES1A1 include the anti-platelet prodrug clopidogrel (prescribed to 25 million people a year), chemotherapeutic drugs (irinotecan, capecitabine), ACE inhibitors, anti-virals and more. In the case of clopidogrel -CES1A1 is responsible for inactivation of the prodrug (clopidogrel) - coding polymorphisms altering enzyme function have been found to significantly affect active metabolite levels and decrease drug efficacy. I hypothesize that regulatory variation exists in CES1A1 and will significantly affect clinical drug response in the context of clopidogrel and other CES1A1 substrate treatments. CES1 gene expression is highly variable. However, a previously identified heritable translocation (CES1A1VAR) in the 5` region of CES1A1 is associated with a 2.41-fold decrease in total gene expression. Characterization of the CES1A1VAR allele indicates two distinct sizes, a long translocation (MAF = 0.17), and a shorter rare form (CES1A1SHORTVAR, MAF ≤ 0.02). CES1A1 allelic expression imbalance was observed (1.3-1.5 fold) in ~2/3 of CES1A1VAR heterozygotes indicating cis-acting regulatory variants are present. CES1A1VAR did not associate with protein quantity or activity. Luciferase constructs containing the CES1A1 promoter and 5` UTR indicate that CES1A1VAR does not affect protein levels, though other constructs show significant decreases in activity (CES1A1SHORTVAR and rs3815583, p < 0.05). The role of the various 5`UTRs in transcription initiation is currently being investigated. No significant variants besides CES1A1VAR have been identified. Though no significant effect of CES1A1VAR is observed in protein activity or quantity, this variant may still require consideration in the context of CES1 substrate drug treatments.

Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer
Saras Saraswathi, Vasily Sachnev Andrzej Kloczkowski and Suresh Sundaram
Contact email:

The neural network based Extreme Learning Machine is combined with a Binary Coded Genetic Algorithm to select a small set of 92 genes which simultaneously classify 14 different types of cancers simultaneously, to high accuracy. IPA analysis of the selected genes reveals that over 60% of the selected genes are related to many cancers that are being classified. In addition, over 30 molecules related to these genes are identified as experimentally identified bio-markers related to prognosis, diagnosis and treatment of cancer.

A Behavioral Genetic Approach to Independent Reading, Home Literacy, and their Connections to Reading Ability
Victoria J. Schenker, Stephen A. Petrill
Contact email:

Purpose: While we know reading ability is heritable, there is also a large literature suggesting that child-driven measures such as independent reading and caregiver-driven measures of the home literacy environment (HLE) are also important to literacy. The current study uses a longitudinal twin design to examine the etiology of these experiences as they relate to reading outcomes. Method: Same-sex twins from 436 families drawn from the Western Reserve Reading and Math Projects were assessed in multiple waves over six years beginning, in kindergarten or first grade. Participants were assessed on independent reading (the amount of time child spends reading alone, reading self efficacy, and willingness to take on difficult reading material), the home literacy environment (e.g. amount of time parents read to themselves, literacy materials in the home), and on standardized measures of letter knowledge, word-level reading, and reading comprehension. Results: Independent reading was highly heritable across all waves of assessment and was associated with genetic influences related to reading. The home literacy environment was mostly attributable to shared environment and was associated with environmental factors, but only in early reading. Conclusion: The present study suggests that the probability of coming into contact with positive or negative reading environments may be influenced by genetic influences associated with reading, particularly when children gain control over their own reading. It is possible that the experience of independent reading may be more vulnerable to genetic risks for poor reading.

Variants in Hdac9 intronic enhancer at Skts5 impact Twist1 expression
Tyler Siekmann, Amanda Ewart Toland
Contact email:

The susceptibility to skin tumorigenesis 5 (Skts5) locus on mouse chromosome 12 was mapped through linkage analysis of skin tumor susceptible Mus musculus (NIH/Ola) and skin tumor resistant Outbred Mus spretus (outbred spretus) mice. Expression and sequence analysis of genes at Skts5 led to the identification of Hdac9 as potential candidate for Skts5; Hdac9 contains both amino acid variations and differential expression in skin between the strains. Furthermore, variants in human HDAC9 show allele-specific imbalance in human cutaneous squamous cell carcinomas (cSCC), suggesting a role for this gene in human cSCC. Interestingly, studies by others identified an exonic/intronic enhancer in human HDAC9 that impacted expression of neighboring TWIST1. From these data we hypothesized that mouse Hdac9 might also contain an enhancer element and that sequencing variants in this region might contribute to differential expression of the oncogene, Twist1. To test this hypothesis we first performed sequencing analysis and identified 43 sequence variants between NIH/Ola and outbred spretus mice from the orthologous region of the human HDAC9 enhancer, mouse Hdac9 intron 8. We subcloned this region into nine segments; two of these segments differentially impacted luciferase expression in vitro. NIH/Ola clones showed 2-fold increased luciferase expression relative to vector alone or the similar region in outbred spretus. Furthermore, cells transfected with this segment of the NIH/Ola intron 8 led to a 2.2 fold increase in Twist1 expression, but the same region in Outbred spretus resulted in no up-regulation of Twist1. In silico transcription factor analyses identified a number of transcription factors that were predicted to differentially bind NIH/Ola and Outbred spretus variants. Chromatin immunoprecipiation (ChIP) studies of two transcription factors, Gata3 and Oct1, demonstrated differential binding between NIH/Ola and Spretus DNA that fit the in silico predictions. Together these studies show evidence that the mouse orthologous region to a human exonic/intronic HDAC9 enhancer, also acts as an enhancer for Twist1. As Hdac9 intron 8 sequence variants between NIH/Ola and outbred spretus differentially impacted luciferase expression, Twist1 expression and Gata-3 and Oct1 binding, they are candidates for differences in skin tumor susceptibility mapping to Skts5.

How much does timing matter?: A genetically-sensitive study of word and nonword reading
Brooke Soden & Stephen A. Petrill
Contact email:

Timed measures of word and nonword reading are frequently used as indicators of children's status and progress in reading. Though quick, additional task demands of timed measures may tap different skills than untimed measures. Additionally, demands may differ for word versus nonword reading. The current study investigated the overlap and independence of genetic and environmental influences on word and nonword reading in timed and untimed measures. Participants were twins from the Western Reserve Reading and Math Project: 154 identical and 222 same-sex fraternal twins. Timed reading measures were assessed using Sight Word Efficiency (word) and Phonemic Decoding Efficiency (nonword) TOWRE subtests. Untimed measures were Word Identification (word) and Word Attack (nonword) WRMT-R subtests. Cholesky decomposition models were fit to assess genetic and environmental overlap among measures. Heritabilities for timed measures were slightly higher than those of untimed measures. The bulk of genetic variance in timed and untimed measures was shared and specific genetic variance was nonsignificant. A similar pattern was observed for word versus nonword measures. Across tasks shared environmental influences were modest and nonsignificant, and showed a high degree of overlap. Timed versus untimed measures of word and nonword reading appear to tap the same genetic and environmental factors suggesting that the measures may largely tap the same underlying processes. Further, it supports the validity of using timed progress monitoring measures and lessens the concern of introducing additional genetically mediated variance beyond that in untimed measures. Future studies will also investigate potential relations with specific measured genes.

A fast and powerful test of independent assortment with implications for the analysis of 'big data'
Stewart, WCL and Hager, V
Contact email:

Over the past two decades, gene-mappers have begrudgingly come to accept that p-values from genome-wide linkage studies are inaccurate, conservative, or computationally demanding. Worse yet, these undesirable attributes are often magnified in big studies, whether they contain hundreds of small families, or a handful of large extended families. Therefore, to facilitate the analysis of 'big data', we developed a new test of independent assortment (TIA) that permits the accurate and rapid computation of p-values (e.g. 250x speed-up relative to competing methods) from modern genome-wide linkage scans. Under the null hypothesis, we show that the limiting distribution of TIA is well approximated by a collection of independent Gaussian processes. Furthermore, under any alternative hypothesis, we prove that TIA and the widely used Kong & Cox lod have the same power.

Characterization of Functional Polymorphisms Affecting Splicing of Cholesteryl Ester Transfer Protein
Adam Suhy, Katherine Hartmann, Leslie Newman, Audrey Papp, Thomas Toneff, Vivian Hook, Wolfgang Sadee
Contact email:

Cholesteryl ester transfer protein (CETP) is responsible for the exchange of esterified cholesterol for triglycerides between HDL and LDL within the reverse cholesterol transport pathway. CETP functions to increase LDL and decrease HDL. Some evidence suggests that low CETP levels in the presence of statins may correlate with increased risk of mortality. Understanding genetic markers indicative of CETP expression may serve to be useful in the diagnosis and treatment of coronary artery disease. CETP is alternatively spliced, resulting in the skipping of exon 9 resulting in a truncated, non-functional form of the protein that remains within the endoplasmic reticulum of the liver. Initial association studies suggest a significant effect of the proposed splicing SNPs (rs5883 and rs9930761) on the relative amount of splicing observed in liver cDNA. To determine the isolated effects and interaction of both SNPS we used a mini-gene construct containing cloned CETP DNA spanning exon 8 to intron 10 inserted into a pCMV-Tag2B expression vector. Using site directed mutagenesis we altered single base pairs at the site of one or both suspected splicing SNPs to generate all possible combinations of major and minor alleles. We transfected our constructs into HEK and HepG2 cells to assay the effect of each allele combination on splicing. The splicing assay utilized real time PCR to detect the different abundances of each splice form. Additionally, Western blot experiments were used to show the relative presence and absence of the full length and truncated form of the protein in liver and plasma. Our results show that rs5883 had a significant effect on the amount of 9-CETP mRNA, the minor allele responsible for a 10-15% increase in 9-CETP mRNA levels. The effect of rs9930761 was not significant. Additionally, Western blots showed that protein corresponding to the full length was observed in liver and plasma, while 9-CETP protein was observed only in liver. These results support the hypothesis that rs5883 is the primary determinant of alternative splicing in CETP and that 9-CETP is not secreted from the liver, causing it to be biologically inactive in reverse cholesterol transport. rs5883 could be used in future studies as a marker of CETP splicing and activity to assess association between CETP and clinical outcomes. Supported by U01 GM092655.

Lessons learned from the genomic characterization of patient-matched frozen and formalin fixed, paraffin-embedded tissues: progress update
TCGA FFPE AWG; co-chaired by Erik Zmuda and Jorge Reis-Filho
Contact email:

The advent of massively parallel sequencing has resulted in major advancements in our understanding of tumor biology. Studies utilizing this technology have resulted in the identification of new drivers of tumor progression, new therapeutic targets, and the development of a molecular-based cancer taxonomy. As this emerging taxonomy is applied to the clinical testing environment, precision medicine is becoming a clinical reality. Importantly, these founding studies were largely conducted with nucleic acid extracted from frozen tissues. As a result, current genomics platforms have been optimized for the analysis of DNA/RNA extracted from frozen samples. In contrast, the vast majority of specimens used for diagnostic purposes are only available as formalin-fixed paraffin-embedded (FFPE). Given that there can be molecular artifacts introduced by FFPE fixation, there is a need to identify and optimize best practices for the analysis of FFPE samples. Defining a signature of FFPE on these platforms will help bridge the gap between frozen and FFPE diagnostic material, and facilitate application of the emerging cancer taxonomy to clinical testing environments. Through a collaborative effort between the Biospecimen Core Resource (BCR) at Nationwide Children's Hospital, Memorial Sloan Kettering Cancer Center and 6 genomic characterization centers, The Cancer Genome Atlas (TCGA) has sought to test and implement methods that could enable the use of DNA and RNA extracted from FFPE specimens with current genomics platforms. Our aims were i) to develop an optimal method nucleic acid extraction from FFPE samples; ii) to assess the performance of exome sequencing, copy number profiling, RNA and miRNA sequencing, and methylation analysis; iii) to define best practices for the analysis of these platforms using nucleic acids extracted from FFPE samples; and iv) to provide guidelines for the optimal use of genomic methods for the analysis of FFPE samples. To achieve these aims, TCGA procured a series of cancers from 38 patients, from which matched frozen and FFPE tumor samples and germline DNA extracted from blood leukocytes were available. These samples were subjected to whole exome, RNA and miRNA massively parallel sequencing, array profiling (SNP6 and methylation), and analyzed using the 'best practice' approaches implemented in the TCGA affiliated characterization centers. For all comparisons, frozen specimens were adopted as the 'gold standard'. A progress report of the findings from this effort will be presented.

Kelviz: A Graphing Tool for Statistical Evidence in Human Genetics
J. Valentine-Cooper, S. Seok, K. Walters, Y. Huang, and V.J. Vieland
Contact email:

Kelviz is a custom built graphing application designed to visualize and publish output from Kelvin, a software package implementing the PPL framework for statistical genetic data analyses. The program can also be used to graph any statistic, such as p-values or LODs, in sequence along the human genome. Current options in Kelviz include capability to plot results genome-wide or in specific regions, create flexible layouts of multiple plots, reuse data from one plot as input to another, and import text annotations. Additional features include a flexible set of options to manually edit annotations and comments, zoom, and customize text size, font, and color. Kelviz saves figures in a custom file format, GRX, and exports figures, either whole figure or tabular collage, to common publication-ready raster and vector image formats such as PNG, TIFF, EPS, SVG, and PDF. The GRX file format can be reused, similarly to the FIG file format in MATLAB, to facilitate efficient collaboration efforts for graph creation. Kelviz is a cross-platform graphical user interface (GUI) application regularly tested on Windows, Mac, and Linux. It is written in Python using the Matplotlib plotting library and allows automation via an optional command-line interface.

Novel methods for measuring genetic evidence using public data repositories change picture of the schizophrenia genome
KA Walters, W Valentine-Cooper, J Burian, Y Huang, VJ Vieland
Contact email:

The Combined Analysis of Psychiatric Studies (CAPS) project has two aims: (1) Comprehensive data cleaning and regularization of psychiatric genetic data (genotypes and phenotypes) in a large-scale data repository maintained by the NIMH; (2) Reanalysis of repository data using the PPL statistical genetic framework, which is specifically tailored to the analysis of highly heterogeneous multi-site human genetic data sets. We report here on results for the repository's current schizophrenia (SZ) collection. Data cleaning across multiple SZ data sets resulted in correcting erroneous pedigree structures, aligning IDs, removing improper or undefined clinical codes, importing additional phenotypic information and updating genomic information. Two results of this data regularization process were striking. First, despite the resulting reduction in the number of multiplex families, linkage signals went up at salient loci, which illustrates that focusing on well-defined sub-phenotypes across sites results in a more homogeneous data set. Second, within individual data sets, data cleaning had only small effects; however, performing standard meta-analysis of the data across sites yielded substantially different results overall. Thus even seemingly minor data changes can have large impacts on our gestalt view of the genome for a complex disorder. Using the PPL analysis the data resulted in a picture of the schizophrenia genome with some overlap with previous reports but also some novel loci, with differences both in terms of the specific loci implicated and in the rank ordering of these loci by relative strength of evidence. Several loci not previously reported in the NIMH data but supported by other reports in the literature emerged, as did loci without any previous linkage reports in SZ but with positional support from association studies. By design, genetic data repositories permit analysis of far larger quantities of data than can be collected by any one study. However, our results show that careful attention both to data quality and data analytic methods is required to extract meaningful evidence from such sources. With the advent of affordable sequence data, we predict that revisiting family data already amassed in the repository will provide a cost-effective mechanism not just for discovering linkage peaks, but for fine-mapping these peaks down to the level of the individual gene or variant.

Family-Based Bayesian LASSO for Detecting Association of Rare Haplotypes with Common Diseases
Meng Wang; Shili Lin
Contact email:

In recent years, there has been an increasing interest in using common SNPs amassed in GWAS to investigate rare haplotype effects on complex diseases. Evidence has suggested that rare haplotypes may tag rare causal single nucleotide variants, making SNP-based rare haplotype analysis not only cost effective, but also more valuable for detecting causal variants. Although a number of methods for detecting rare haplotype association have been proposed in recent years, they are population based and thus susceptible to population stratification. In this dissertation we propose family-based logistic Bayesian Lasso (famLBL) for estimating the effect of haplotypes on complex diseases using SNP data. By choosing appropriate prior distribution, effect size of an unassociated haplotype can be shrunk toward zero, allowing for more precise estimation of associated haplotypes, especially those that are rare, thereby achieving greater detection power. We evaluate famLBL using simulation to gauge its type I error and power. Comparison with its population counterpart LBL highlights famLBL's robustness property in the presence of population substructure. Further investigation by comparing famLBL with traditional family based association test (FBAT) reveals its advantage for detecting rare haplotype association. Compared with first order collapsing methods like Combined Multivariate and Collapsing (CMC) and the single variant version of FBAT (fbat-v0), as well as the second order collapsing methods such as sequence kernel association test (SKAT), famLBL is more consistent in its ability to detect across different settings. To demonstrate the practical utility of famLBL, we applied it to the Framingham Heart Study data in the hope of identifying haplotypes associated with high blood pressure. Focusing on common SNVs identified by another method, we were able to locate rare haplotypes associated with the trait that potentially tag causal rare variants. Future work on famLBL includes extending the method to extended pedigrees, as well as incorporating environmental factors and modeling quantitative traits.

A Behavioral Genetic Study of the Cognitive Processes Linking Behavioral Problems and Reading Skills
Zhe Wang, Stephen Petrill
Contact email:

Poor reading skills are observed more often in children with higher levels of behavioral problems. Additionally, children with more behavioral problems and those with poorer reading skills exhibit more working memory (WM) deficits. Therefore, it is possible that deficits in WM are the mechanism that underlies the observed comorbidity between poor reading and behavioral problems. The current twin study aimed to explore this possibility by examining whether and how the genetic and environmental factors contributing to individual differences in WM account for the genetic and environmental relations between children's behavioral problems and their reading skills. Data from 105 monozygotic and 147 dizygotic twin pairs (average age = 12.21 years) were used for the analyses. Results showed that genetic influences on WM accounted partially for the genetic associations between inattentiveness and reading, and completely for the genetic associations between ODD symptoms and reading. However, the genetic relations between hyperactivity-impulsivity and reading were not explained by the genetic factors contributing to WM. Genetically influenced individual differences in WM were indicated as one potential mechanism that underlies the observed comorbidity between poor reading skills and both inattentiveness and ODD. These findings suggested that in order to improve reading in children high in behavioral problems, identifying the pathways linking variation in genes and WM and targeting WM training may prove more fruitful.

Mining massive SNP data for identifying associated SNPs and uncovering gene relationships
Amy Webb, Aaron Albin, Zhang Ye, Majid Rastegar-Mojarad, Kun Huang, Jeffrey Parvin, Wolfgang Sadee, Lang Li, Simon Lin, Yang Xiang
Contact email:

Our goal is to find SNPs that coexist in a significant number of samples, especially the long-range SNP associations, and the relationships among these associated SNPs and corresponding genes. To realize this goal, we have developed a data mining workflow and an efficient algorithm FCI-RC for mining SNPs that are associated across the whole genome. By applying our method on the original SNP data and random chromosome permutation data, we demonstrate that our method is able to find non-random SNP associations across multiple chromosomes. Among the large amount of associated SNPs identified by our method, many of them involve multiple chromosomes. Some SNP associations also suggest novel relationships among the corresponding genes.

RNA transcript expression and the effects of nicotine in human brain regions.
Amy Webb, Amanda Curtis, Leslie Newman, Erica Graziosa, Daquing Wang, Wolfgang Sadee and Audrey Papp
Contact email:

The purpose of this ongoing study is use next generation RNA sequencing to identify similarities and differences in RNA transcripts, between brain regions, and between individuals within those brain regions. An additional goal is to provide a basic overview of how nicotine use affects coding and non-coding RNA's in the ten selected brain regions. The study was performed on brain regions obtained from five confirmed nicotine users (smokers) and five matched non-smoking controls. In order to understand the complex mechanisms underlying brain function, we used SOLiD (Life Technologies) Next Generation RNA sequencing to define and quantitate complete transcriptomes of coding, non-coding and small RNA in all 10 brain regions. The SOLiD Lifescope pipeline was used to align and map the resulting sequence. As an initial approach, we considered the differential expression and splicing variation detectable in a single brain region: Brodmann Area 46 (BA46). BA46 is the highest cortical area responsible for motor planning, organization, and regulation. It plays an important role in the integration of sensory and mnemonic information and the regulation of intellectual function and action. Analysis was performed by edgeR to test for differential expression of individual genes by applying an exact test based on a negative binomial distribution. At an FDR cutoff of 0.1, 42 genes were found to be differentially expressed with nicotine use by edgeR. GO term enrichment analysis characterized these genes as involved in nerve impulse/ensheathment and transporter activity. Splice junctions were identified after a unique double alignment program based on splicing information listed in alignment files. This approach allowed us to detect more junctions than the typical SOLiD/Lifescope splicing pipeline. Through collaboration with LifeTech, Partek was run in parallel to identify differentially expressed genes and differential spliceoforms for comparison. Analysis of differential expression of RNAs across brain regions in different individuals affords an opportunity to understand the brain specific expression patterns that are affected by nicotine use.

Combine and Conquer: An integrated software suite for finding causal relationships between sequence variants and clinical phenotypes
Peter White*, Veronica J. Vieland#, David Greenberg# and Susan E. Hodge#
Contact email:

Next generation sequencing (NGS) has revolutionized genetic research creating new opportunities to identify genomic variants with clinical significance in individual patients and families. The technology has been widely adopted by the research community and is now seeing rapid adoption clinically, driven by recognition of NGS's diagnostic utility and enhancements in quality and speed of data acquisition. There has been some success in the discovery of rare protein coding variants that appear to be causal in orphan diseases. However, as the technology is applied to more common diseases there is much still to be learned before causal links can be reliably established. Challenges remain in three primary areas. (1) NGS and bioinformatic methods, including improving sequence coverage (whole genome vs. exome), reducing sequencing errors, optimizing variant identification and genotyping, development of standardized approaches for processing raw sequence data, and improving annotation of genomic features (from regulatory regions to coding sequences). (2) Data analytic methods, including, development of effective ways to focus attention on sequence variation on small regions of the genome and to sort causal from non-causal variants within those regions; and (3) Simulation methods, which allow us to design and conduct in silico "experiments" to evaluate our genomic and data analytic methods, obtain realistic estimates of power and inform follow up study design and sample size. Our team at NCH has developed three in-house computer programs (Churchill, Kelvin and Caleb) focused on each of these areas in turn. Here we illustrate how "linking" them for iterative study design and data analysis results in a unique and powerful integrated approach. Our long term aim is to develop an interface of utilities that will simplify the analysis process for the investigator, enabling "soup to nuts" use of genome sequence in patients and their families. *Center for Microbial Pathogenesis, #Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital and Department of Pediatrics, The Ohio State University, Columbus, Ohio.

Linkage Disequilibrium Mapping of the 13q21 Specific Language Impairment Locus Using Epistasis Analysis Models
Samuel L. Wolock, Ning Li, Stephen A. Petrill, Judy F. Flax, Anne S. Bassett, Linda M. Brzustowicz, Christopher W. Bartlett
Contact email:

Background: We present a fine-mapping study of the 13q21 region linked to Specific Language Impairment (SLI), a neurodevelopmental failure to develop normal vocabulary and grammar despite otherwise normal cognition and ability. In previous work, we mapped the SLI3 locus to 13q21 using a written language impairment phenotype using five extended families from Canada, and later replicated the locus using nuclear and extended families form the U.S. In refining the localization, we found that a coding SNP in BDNF associated with memory greatly increased localization and evidence for the locus when included as part of a gene-gene interaction with the unidentified 13q21 risk alleles. Here we performed additional mapping of the region to identify the gene responsible for the linkage signals. Methods: We assessed linkage/association of Illumina SNP array genotypes with two language phenotypes using the posterior probability of linkage (PPL) and the posterior probability of linkage disequilibrium (PPLD) metrics. Analysis was conducted twice: a baseline analysis and an analysis that incorporated BDNF genotypes from a coding SNP associated with memory into a gene-gene (epistatic) interaction model. The posterior probabilities from the two models are on the same scale and can be directly compared to assess if a SNP is involved in an epistatic interaction. Results: Several SNPs within an LD block were associated with the categorical written language impairment diagnosis used to map this locus in the original genome-scans (maximum PPLD = 41%). We also observed weaker association with the quantitative trait underlying the written language impairment trait (maximum PPLD = 17%). The implicated region overlaps ATXN8OS, an anti-sense transcript of the KLHL1 gene that contains a tri-nucleotide repeat that, when expanded, causes a form of spinocerebellar ataxia. Additionally, SNPs at another locus in PCDH9, a neurally expressed gene, were associated with the quantitative language trait in the U.S. families (maximum PPLD = 94%). At both loci, modeling the coding SNP in BDNF showed an epistatic effect. Conclusion: SNPs upstream of KLHL1 may be driving the 13q21 linkage signal observed in the Canadian pedigrees, and followup halplotype and conditional linkage analyses are warranted. In addition, our findings suggest a role for PCDH9 in language development.

Detection of Dynamic Effects of Rare Haplotypes and Their Interaction with Environmental Factor on Complex Disease
Shuang Xia, Shili Lin
Contact email:

Two important contributors to missing heritability are believed to be rare variants and gene-environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide polymorphism (SNP) analysis. Further, the behavior of a gene can be dynamic, thus it is important to focus on gene-environment interactions and study the dynamic effects of genes on a trait over time if longitudinal data are available. In this work, we model the effects of both rare and common haplotypes over time using longitudinal data through time-varying coefficients (tvc) using B-spline and incorporate environmental factors and their interacting effects using the Logistic Bayesian LASSO (LBL) framework, leading to the LBL-tvc methodology. Since longitudinal data are collected forward in time over a certain period in a cohort of individuals, we formulate the likelihood of our model prospectively. We cast the problem into a Bayesian framework for more precise estimations of effect sizes of rare haplotypes and adopt the Markov chain Monte Carlo (MCMC) methods for sampling from the posterior distribution for statistical inferences. We carry out extensive simulations to evaluate the properties of LBL-tvc and to assess its robustness to model mis-specification. We also apply LBL-tvc to analyze the MAP4 gene on chromosome 3 and smoking on their effects on hypertension based on data from a Mexican American population, and have identified several haplotypes, including a rare one, that are associated with hypertension with varying effect sizes in the range of 55-85 years of age.

Imprinting and Maternal Effect Detection Using Partial Likelihood Based on Discordant Sibship Data
Fangyuan Zhang; Shili Lin
Contact email:

Numerous statistical methods have been developed to explore genomic imprinting and maternal effects, which are causes of parent-of-origin patterns in complex human diseases. However, most of them either only model one of these two confounded epigenetic effects, or make strong yet unrealistic assumptions about population, such as allelic exchangeability and mating symmetry, to avoid over-parameterization. A recent partial likelihood method is able to detect imprinting and maternal effect simultaneously utilizing case-parents/control-parents families without making unrealistic assumptions. Though the method is robust and powerful, it requires the recruitment of control families, which are difficult to obtain in most studies. In this paper, we develop a partial Likelihood method for detecting Imprinting and Maternal Effects for a Discordant Sib-Pair design (LIME_DSP) that may also accommodate affected or unaffected siblings. By matching affected and unaffected probands and stratifying according to their familial genotypes, a partial likelihood component free of nuisance parameters can be extracted out from the full likelihood. This alleviates the need to make assumptions about mating type probabilities, the nuisance parameters. Theoretical analysis shows that the partial maximum likelihood estimators based on the LIME_DSP approach are consistent and asymptotically normally distributed. A simulation study demonstrates the robust property of LIME_DSP and shows that it is a powerful approach without resolving to collect control families. To illustrate its practical utility, LIME_DSP was applied to the Framingham Heart Study data.