Materials and methods. Sequence analyses

Download 20.56 Kb.
NameMaterials and methods. Sequence analyses
A typeDocumentation > manual > Documentation


Sequence analyses

We search the annotated TCF4 genes in Ensembl data base version 56 ( and UCSC genome browser ( Ensembl annotates 47 orthologues for human TCF4 (ENSG00000196628) gene, but we restricted our analysis to only 14 representative species. The species used were: Human (specie Homo sapiens, assembly GRCh37), Chimpanzee (Pan troglodytes, CHIMP2.1), Orangutan (Pongo pygmaeus, PPYG2), Macaque (Macaca mulatta, MMUL 1.0 ), Mouse (Mus musculus, NCBI m37), Rat (Rattus norvegicus, RGSC 3.4), Dog (Canis familiaris, CanFam 2.0), Horse (Equus caballus, Equ Cab 2), Anole Lizard (Anolis carolinensis, AnoCar1.0), Platypus (Ornithorhynchus anatinus, Ornithorhynchus_anatinus-5.0), Opossum (Monodelphis domestica, monDom5), Xenopus (Xenopus tropicalis, JGI 4.1), Medaka (Oryzias latipes, HdrR) and Fugu (Takifugu rubripes, FUGU 4.0).
For all the analyses we used the canonical protein, which Ensembl defines as the longest deduced transcript for each species. Manual inspection of the predicted orthologous revealed a possible error. The predicted orthologous in Dog is ENSCAFT00000000229 but this gene only has 3 exons, compared with 21 in human. A predicted paralogous of this gene (ENSCAFT00000000232) has 17 exons. Because an intron loss event is needed to explain this and we think a more parsimonious answer is that orthologous of the human TCF4 is ENSCAFT00000000232 and that ENSCAFT00000000229 may have been generated by a retrotransposition event, which could explains the fewer introns of this gene. We retained ENSCAFT00000000232 as the orthologous for further analyses.
We calculated sequence identity and similarity with BLAST . DNA sequence repeats identified Repeat Masker were extracted from the UCSC tables using the software Galaxy . Amino-acid alignments were performed with MUSCLE and the visualization was generated with Clustal X (2.0.12) . The secondary structure analysis was performed using the program ali2d (

In order to characterized the N- terminal region founded in chimpanzee and human we followed the protocol described by Emanuelsson et al. , which involves the use of a suit of bioinformatic software publicly available to predict subcellular location of eukaryotic proteins.

TCF4 transcripts and expression

We obtained the alternative transcripts of the human TCF4 gene from NCBI AceView (version April 2007) (, Ensembl (release GRCh37) and UCSC (Mar. 2006 (NCBI36/hg18) assembly). We next examined TCF4 expression in different tissues using the microarray data integrated in Geneinvestigator (V3,, BioGPS ( and the Sestan Lab Human Brain Atlas Microarrays data set, which is available at the UCSC Browser.
For the experimental gene expression analysis, isolation and purification of mRNA from whole blood the PAXgene extraction kit (Qiagen) was performed for 106 cases (76 male, 30 female; mean age 40) and 96 controls (42 male, 54 female; mean age 39). RNA was isolated according to the manufacturer's instructions including an optional DNase digestion step. Total mRNA was quantified using was measured with a ribogreen assay (Invitrogen Quant-itTM Ribogreen, #R11490). Quality of total RNA was checked using Agilent 2100 Bioanalyzer. Genome-wide RNA expression profiling was obtained with the Illumina HumanRef-12 arrays using Illumina’s standard protocol at UCLA facility. In short, RNA samples were prepared with the Illumina TotalPrep kit amplification and labeling protocol. 750 ng of amplified and biotinylated labeled cRNA was then used for array hybridization. BeadChips were scanned using an Illumina BeadArray reader.
BeadStudio© software version 3.2.3 was used to extract raw data and generate background-corrected gene-expression data. Background correction was performed by subtracting the average value of negative control beads present on the array. Further pre-processing was done using the Lumi package for R . A variance stabilizing transformation was applied to preserve much of the gene-expression variance. Data were normalized using the robust spline normalization method . Genes were then filtered based on detection values generated by BeadStudio©. The detection p-value threshold was set at 0.01, leaving 26,109 probes for analysis. Chip quality and outlier detection was performed by assessing quality statistics and plots (hierarchical clustering, box plots, density distribution plots, pair-wise correlations) before and after transformation and normalization. Differential expression was tested using the linear model (with age and gender as covariates) in the Limma R package .
Diagnoses were determined by Standardized Psychiatric interviews either The Comprehensive Assessment of Symptoms and History (CASH) or the Composite international diagnostic interview (CIDI) by trained clinicians. Schizophrenia was defined by a DSM-IV-TR diagnosis of #295.0-295.89, and #298.9. All participants gave written informed consent. This study was approved by Medical Research Ethics Committee (METC) of the University Medical Center Utrecht, The Netherlands (accredited on November 1st', 1999 by ex section16 of the WMO) (collection in both Utrecht and the Hague) and the Committees on Biomedical Research Ethics for the Capital Region of Denmark. Antipsychotic-free patients were not on antipsychotics during the six-month-period prior to blood sampling.
Controls were excluded based on presence of psychiatric traits and/or family history of psychiatric disease.
Inclusion criteria for both cases and controls included Caucasian descent and no relatedness. Questionnaire data on ethnicity and relatedness was available for all subjects.
Controls: 42 males, 54 females. Cases: 76 males, 30 females. Mean age of both groups is 39.5 years.

Analysis of SNPs in the TCF4 gene

We use Galaxy to extract all SNPs annotated in db130 within the TCF4 locus (chr. 18: 49019029 – 52414148, assembly hg18) and applied a bioinformatics protocol to evaluate if they are within possible functional sites. SNPs were also extracted from HAPMAP (Release #28) and 1000 genomes pilot data 1 . Functional regions were defined with several UCSC Tables (Supplementary Table 8). We used the software SNAP to evaluate the LD between SNPs.

Identification of possible regulatory elements

Multispecies Conserved Sequences (MCS) within the TCF4 locus were identified with VISTA genome browser and UCSC tracks were examined for investigate the possible function of these regions based on the overlap with functional genomic experiments results deposited in the browser. We searched for conserved miRNA predicted binding sites on TCF4’s 3’ UTR using TargetScan DB . We assessed if these miRNA are associated with human diseases using the miRNA and Disease database .

Co-expression and protein-protein interactions networks of TCF4

We used gene co-expression data published for human brain to identify TCF4 co-expression partners. These were analysed using gene-set enrichment analysis with MetaCore (GeneGO, Inc) and Cytoscape .


Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215(3):403-10.

Biegert A, Mayer C, Remmert M, Soding J, Lupas AN. 2006. The MPI Bioinformatics Toolkit for protein sequence analysis. Nucleic Acids Res 34(Web Server issue):W335-9.

Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J. 2010. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19:Unit 19 10 1-21.

Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B and others. 2007. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2(10):2366-82.

Du P, Kibbe WA, Lin SM. 2008. lumi: a pipeline for processing Illumina microarray. Bioinformatics 24(13):1547-8.

Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Gibbs RA, Hurles ME, McVean GA. 2010. A map of human genome variation from population-scale sequencing. Nature 467(7319):1061-73.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792-7.

Emanuelsson O, Brunak S, von Heijne G, Nielsen H. 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953-71.

Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I. 2004. VISTA: computational tools for comparative genomics. Nucleic Acids Res 32(Web Server issue):W273-9.

Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P. 2008. Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics 2008:420747.

Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O'Donnell CJ, de Bakker PI. 2008. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24(24):2938-9.

Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R and others. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947-8.

Lewis BP, Burge CB, Bartel DP. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1):15-20.

Lin SM, Du P, Huber W, Kibbe WA. 2008. Model-based variance-stabilizing transformation for Illumina microarray data. Nucleic Acids Res 36(2):e11.

Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, Cui Q. 2008. An analysis of human microRNA and disease associations. PLoS ONE 3(10):e3420.

Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH. 2008. Functional organization of the transcriptome in human brain. Nat Neurosci 11(11):1271-82.

Smyth GK. 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3.

Thierry-Mieg D, Thierry-Mieg J. 2006. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol 7 Suppl 1:S12 1-14.

Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, 3rd and others. 2009. BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10(11):R130.

Share in:


Materials and methods. Sequence analyses iconSupplementary Materials and Methods

Materials and methods. Sequence analyses iconSupplementary materials and methods

Materials and methods. Sequence analyses iconSupplementary Materials and Methods

Materials and methods. Sequence analyses iconSupplementary Materials and Methods

Materials and methods. Sequence analyses iconDetailed materials and methods

Materials and methods. Sequence analyses iconSupplementary Materials and Methods

Materials and methods. Sequence analyses iconSupplementary Materials and methods

Materials and methods. Sequence analyses iconBasic concrete materials and methods

Materials and methods. Sequence analyses iconSupporting Information Materials and methods

Materials and methods. Sequence analyses iconSupporting Information-Materials and Methods

Materials and methods. Sequence analyses iconAdditional File 3 Supplementary Materials and Methods

Materials and methods. Sequence analyses iconSupplementary Materials and Methods, Tables, and Figure Legends

Materials and methods. Sequence analyses iconA. Materials and Methods Retrieval and verification of functional gene sequences

Materials and methods. Sequence analyses iconMethods S1 Plant materials, inoculation of the am fungus and cultivation conditions

Materials and methods. Sequence analyses iconSupplementary Materials and Methods and Supplementary Tables

Materials and methods. Sequence analyses iconHypotheticals and Analyses

Materials and methods. Sequence analyses iconSample preparation for Ca, Mg, Na, and k analyses

Materials and methods. Sequence analyses iconSequence of Job

Materials and methods. Sequence analyses iconThe materials in this guide were adapted from the course guide “Marketing...

Materials and methods. Sequence analyses iconDescription (Sequence + post-processing)


When copying material provide a link © 2017