http://www.gene-seq.com/biodownl … ologicalCancer_pdf/
Keywords: Oryza sativa; Genetic resources; Genome diversity; Sequence variants; Next generation sequencingData description Purpose of data acquisition
Selection of germplasm
Additional file 1: Table S1A. Information for the 2,466 rice accessions from the International Rice Genebank Collection at the International Rice Research Institute. Table S1B. Information for the 534 rice accessions from the China National Crop Genebank and the CAAS working collections.
Format: XLSX Size: 389KB Download file
Figure 1. Geographical distribution of the 3,000 sampled rice accessions from 89 countries (see Additional file1: Tables S1A and S1B). The numbers in the parentheses after each region are the numbers of the countries in the region.
Data generation and analyses Read alignment and variant identification
Table 1. Characteristics of the single nucleotide polymorphisms (SNPs) identified in the 3,000 rice genomes when aligned to the reference japonica Nipponbare genome IRGSP–1.0
Figure 2. Classification of 3,000 rice accessions into five distinct varietal groups based on 5 sets of 200,000 random sets from the 18.9 million discovered SNP variants.
Availability and requirements Data availability
Availability of supporting data
The 3,000 rice genomes project: participants and affiliations Participants by institute CAAS1
罕见疾病基因发现中心（Findingof Rare Disease Genes ，FORGE）的研究人员通过对246名罕见病患者进行全基因组外显子测序，发现了引发疾病的146个突变位点和67个异常基因。他们将这一发现发表在《美国人类遗传学杂志 》 上。
四家科研机构——多伦多基因组学应用中心、温哥华基因组研发中心、麦吉尔大学、魁北克德基因组发现中心，共同参与该项目中的全基因组外显子测序工作。研究 人员先用Agilent 的sureselect目标富集系统对基因组中的外显子进行捕获，再用illumina的HiSeq 2000进行测序，最终获得全基因组外显子的基因序列。
“大部分致病突变都与临床确诊的疾病有关，这些突变可以深入解释一些疾病的临床表现”，对于这一发现，来自安大略省儿童医院和FORGE组织的首席科学家Kym Boycott 教授感到非常惊讶。
如今FORGE组织已经和国际上其他罕见病研究中心CARE for RARE 开展合作研究，该项目将继续用全基因组外显子测序的方法来发现罕见病的致病基因，并研发出对应的治疗方案。
Kym Boycott 教授在一份声明中说到：“我们在立项的时候，当时我们预测该项目完成后能够解释或者解决50种罕见病症，但是现在我们能解释150种罕见病症”。
FORCE的管理者和该论文的第一作者 Chandree Beaulieu 在声明中说：“对于这个项目的参与人员来说，他们的回报是多方面的。这些测序结果我们都反馈给参与本项目的家庭，决不会把这些信息封锁在实验室和数据库中。这一做法极大鼓舞了整个研究团队。”
Kym Boycott, Jacques Michaud & Jan Friedman
Children’s Hospital of Eastern Ontario Research Institute
April 1, 2011
September 30, 2012
Genetic diseases in children, while often rare, have, in aggregate, an enormous impact on the well-being of Canadian families. Surprisingly, the majority of genes causing these conditions are still unknown. FORGE Canada (Finding of Rare Disease Genes) is a national consortium of clinicians and scientists using next-generation sequencing technology to identify genes responsible for a wide spectrum of rare pediatric-onset disorders present in the Canadian population.
The Consortium brings together clinicians from all 21 Clinical Genetics Centres representing every province and internationally-recognized Canadian scientists with expertise in gene identification, with the infrastructure of the Genome Canada Science and Technology (GC S&T) Innovation Centres. International collaborations have been established with clinicians in 16 countries. Two nation-wide requests for proposals have resulted in 175 disorders that met FORGE criteria; 70 of these rare disorders have been selected for study over the 18 months of this project. These disorders range from those affecting single families, to disorders with 20+ patients from across Canada and internationally recruited through the FORGE network. Twenty of these disorders were prioritized for analysis in the first quarter; 9 genes have been identified and analysis is still underway for the remaining 11 disorders. We are establishing a national data coordination centre to streamline and improve existing large-scale sequence analysis tools and our GE3LS team is working toward national ethical guidelines for analyzing sequence data from entire genomes and for sharing results with families.
Gene discoveries made by the FORGE Canada Consortium will have immediate and long-term benefits for the health of Canadians through translation to diagnostic tests, including the development of new methodologies and algorithms for the use of this technology. Within the first three months, we have identified 9 genes; 6 of these are novel genes that were previously not linked to human disease thereby providing insight into the molecular pathogenesis of these disorders. Successful completion of the activities of the FORGE Canada project will yield a coordinated and sustainable Consortium focused on the investigation of the genetic basis of human disease.
OGI supports the development and maintenance of high-impact, publically-available resources emerging from genomics research projects in Ontario – these resources include technology platforms, databases, software, reagents and libraries. Our aim is to provide Ontario researchers with access to leading-edge, enabling technologies and to maintain domestic resources that can aid genomics research around the world.
Click here to learn about OGI’s Technology Days, an effort to increase the visibility and usage of resources that have been developed by or in partnership with Ontario researchers.
Technology PlatformsThe Centre for Applied Genomics (TCAG)
TCAG provides genomics services to researchers in academic, government, and private sectors all over the world. For more details, click here.
DatabasesAutism Chromosome Rearrangement Database
This resource consists of hand-curated breakpoints and other genomic features relating to autism that derive from publicly available literature: databases and unpublished data. It undergoes continuous updating with data from in-house experiments and published research. It welcomes data and feedback from the research community.
Barcode of Life Data Systems (BOLD)
BOLD is an accessible database that aids in collection, management, analysis, dissemination, and searching of DNA barcodes. It is the definitive global DNA barcode database – created and maintained in Ontario, with researchers from over 25 countries contributing DNA samples. It already contains barcode sequences for over 50,000 species. Approximately three quarters of those have been added by Ontario researchers.
BOLD consists of three components: BOLD-MAS (a repository for DNA barcode records and analytical tools), BOLD-IDS (a species-identification tool that determines taxonomic assignment when possible based on submitted DNA sequences), and BOLD-ECS (for web developers and bioinformaticians to build tools and workflows than can become part of the BOLD framework).
Chromosome 7 Annotation Project
This resource comprises a collection of sequence, gene, and other annotations from all databases (e.g., Celera published, Ensembl, NCBI, RIKEN, and UCSC) as well as unpublished data.
Cystic Fibrosis Mutation Database
This database is a collection of mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene that acts as a resource for CF research everywhere. It currently contains more than 1,500 mutations and provides information about individual CFTR mutations and their related phenotypes. This database is augmented and maintained by a research team funded by Genome Canada through OGI.
Database of Genomic Variants (DGV)
DGV provides a comprehensive summary of structural variation resulting from alterations in the human genome. These changes involve segments of DNA larger than 1kb and insertions and deletions in the range 100bp-1kb. The DGV welcomes data on structural variation in the genome from scientific manuscripts.
Human Genome Segmental Duplication Database
This website contains information about segmental duplications in the human genome. The data come from analysis of the May 2004 Assembly of the Human Genome (also known as NCBI Build 35, or UCSC hg17).
Interologous Interaction Database (I2D)
I2D (formerly OPHID) is an on-line resource for exploring known and predicted mammalian and eukaryotic protein–protein interactions. It contains data for more than 430,000 protein interactions in humans and model organisms (fly, mouse, rat, worm, and yeast).
The Dynactome project has contributed more than 4,100 protein interactions to I2D, which has led to 8,880 more interactions through mapping to other organisms in the database. Further, by exploring texts, reviewing literature, and incorporating other high-throughput data sets, the project has given I2D a further 26,210 interactions and 56,810 interlogs.
Non-Human Segmental Duplication Database
This site contains information about segmental duplications in the genomes of chimpanzee, mouse, and rat.
This is a publicly available database of Affymetrix DNA microarray and serial analysis of gene expression (SAGE) expression data from samples of human and mouse stem cells and their derivatives.
Structural Genomics Consortium (SGC) Materials and Methods
SGC is a not-for-profit organization that analyses the three-dimensional structure of proteins. It deposits structures (on average, 200 per year) in the Protein Data Bank (PDB), which releases them into the public domain and makes them freely accessible.
To access the PDB from the SGC homepage click on the “Structures” tab. From the “Structure Gallery” either search or scroll down for your protein of interest. The SGC entry will indicate the “PDB Code,” which will transfer you to the PDB entry, and at the bottom of the page there will be a link to the SGC structure file with the same PDB code. Following this link provides basic background information on each target and the analysis of its structure. There is also a link to reagents for the structure, as well as one to a detailed description of the experimental materials and methods that generated the structure. For some protein structures, an associated iSee data pack provides an animated interpretation of the structure and tabs that include protocols. Alternatively, access the PDB and select the tabs for “Materials & Methods” and “Biology & Chemistry.” There you will find purification and crystallization protocols, diffraction data, and other details about your selected structure.
Toronto Yeast Interaction Database and Toronto Yeast Pathway Database
These resources consolidate publicly available data and feature web-services interfaces that the Yeast Integrative Biology project maintains.
SoftwareGeneMANIAGeneMANIA is a comprehensive web-based genomic and data analysis tool intended for simple gene function prediction. The GeneMANIA data warehouse includes over 160 million interactions, from more than 130,000 genes, from six different organisms. GeneMANIA is freely available as open-source software. Additionally, a free online tutorial can be found on the OpenHelix website (http://www.openhelix.com/genemania ). GeneMANIA was developed and is being maintained by a research team funded by Genome Canada through OGI.
Cytoscape Web is an online interface that can be used with GeneMANIA to visualize the composite network that are associated with a set of input genes. This tool is freely available as open-source software and was developed by a research team funded by Genome Canada through OGI. Cytoscape Web is now actively developed as part of the open source Cytoscape project.
Automated Splice Site Analyses
A web-based software tool for the prediction of the effects of sequence changes that alter mRNA splicing in human disease. This tool is used by researchers acroos Canada and worldwide, resulting in more than 130 citations to date.
eFISH (electronic fluorescence in situ hybridization)eFISH is a BLAST-based program that facilitates the choice of appropriate clones for FISH and CGH experiments, as well as interpretation of results in which genomic DNA probes are used in hybridization-based experiments.
Network Analysis, Visualization & Graphing TORonto (NAViGaTOR)
NAViGaTOR is a software package for visualization and analysis of protein-protein interaction networks in two or three dimensions (2D or 3D, respectively). It is downloadable free of charge for academic and not-for-profit institutions.
This on-line global community connects businesses, organizations, scientists, water activists, and young people. It informs and engages youths and the broader public on global water issues and their effects on health. It addresses many issues, including ways in which emerging nanotechnology and biotechnology applications can address waterborne and water-related diseases.
Reagents and LibrariesNorth American Conditional Mouse Mutagenesis (NorCOMM)
NorCOMM develops and distributes a library of lines of mouse embryonic stem (ES) cells that carry single conditional-knockout mutations across the mouse genome. ES cells that it develops become publicly available on a cost-recovery basis. NorCOMM also provides services across Canada in archiving, derivation, genotyping, and phenotyping of mouse ES cells.
这种“copying key”对于每种蛋白来说都是特殊的，“似乎每个mRNA分子都知道其翻译而来的蛋白的单位数量，到底是生成10个蛋白，还是100，1000个蛋白拷贝”，慕尼黑工业大学蛋白质组学和生物分析系主任Bernhard Küster教授解释道。
The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.
Mass-spectrometry-based draft of the human proteome
Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estimation of the size of the protein-coding genome, and identified organ-specific proteins and a large number of translated lincRNAs (long intergenic non-coding RNAs). Analysis of messenger RNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analysing the composition and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.
NA12878 (child), NA12891 (father), and NA12892 (mother)
########### NA12891 and NA12878 vcf database
2013 07 29构建的，炎黄一号SNP与Indel的hg19版本的gff格式文件与SNP hg19 版本的文本格式下载：
microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) and represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of miRNAs and lncRNAs from ChIP-Seq data.
The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, we constructed TF->miRNA->mRNAs regulatory networks by integrating CLIP-Seq data and ChIP-Seq data. In addition, we constructed expression profiles of human lncRNAs and mRNAs from RNA-Seq data from 22 normal tissues.
selleckchem 激酶抑制剂，酪氨酸激酶抑制剂，酶抑制剂，蛋白抑制剂，蛋白激酶抑制剂，小分子，磷酸酶抑制剂 代谢通路及抑制剂数据库下载：http://www.gene-seq.com/biodownload/inhibitor_pathway/
- Target Selective Inhibitor Library
- Anti-cancer Compound Library
- Autophagy Signaling Compound Library
- Ion Channel Ligand Library
- PI3K Signaling Inhibitor Library
- Apoptosis Compound Library
- MAPK Signaling Inhibitor Library
- Protease Inhibitor Library
- Anti-infection Compound Library
- Anti-diabetic Compound Library
- Protein Tyrosine Kinase
- 细胞凋亡 (Apoptosis)
- Cytoskeletal Signaling
- 细胞周期调控 (Cell Cycle)
- DNA Damage
- 表观遗传学 (Epigenetics)
- Stem Cells & Wnt
- Neuronal Signaling
- GPCR & G Protein
- Endocrinology & Hormones
- Transmembrane Transporters