刊期:双月刊
主管单位:四川省科学技术协会
主办单位:四川省动物学会/成都大熊猫繁育研究基金会/四川省野生动植物保护协会/四川大学
地址:四川省成都市武侯区望江路29号四川大学生命科学学院内
邮编:610064
电话:028-85410485; 15881112385
传真:028-85410485
E-Mail:scdwzz@vip.163.com & scdwzz001@163.com
刊号:ISSN 1000-7083
        CN 51-1193/Q
国内发行代号:
国际发行代号:
发行范围:国内外公开发布
定价:50元/册
定价:300元/年

您所在位置:首页->过刊浏览->2018年第37卷第3期

棘腹蛙线粒体局部重复序列非排序聚类
Clustering Mitochondrial DNA Sequences Experienced Tandem Duplication Based on Alignment-free Comparison in Quasipaa boulengeri
曹跃1,2, 夏云1, 郑渝池1*
点击:180次 下载:0次
DOI:
作者单位:1. 中国科学院成都生物研究所两栖爬行动物研究室, 成都 610041;
2. 中国科学院大学, 北京 100049
中文关键字:棘腹蛙;线粒体DNA;非排序比对;聚类;重复序列;拓扑结构距离;蛋白编码序列;最大似然树
英文关键字:Quasipaa boulengeri; mitochondrial DNA; alignment-free comparison; clustering; duplication region; Robinson-Foulds distance; protein-coding gene; Maximum Likelihood tree
中文摘要:动物线粒体基因组发生局部串联复制后,涉及区域具有多基因拷贝、假基因化、大量插入缺失的特点,难以排序和构建基因树。而不依赖排序的聚类方法理论上可用来归纳和展示这类序列的差异,但未见相关评估和运用。本研究选取棘腹蛙Quasipaa boulengeri 19号个体,以3类常用的基于特定长度(k)子序列集的非排序算法,依次设k值为4、6、8……20,对其轻链复制起点邻近复制区域583~695 bp的序列进行聚类。构建相同个体线粒体1 518 bp蛋白编码序列最大似然树为参照,计算和考查两者间拓扑结构距离和差异。所评估的28种算法中,半数可在主要为8的特定k值下产生和最大似然树拓扑结构相差仅2个节点(11.8%)的聚类树,部分算法在不同k值下均表现不佳,较小的k值(4)适合解析差异程度相对较高的序列间关系。这些结果例证了动物线粒体重复序列非排序聚类的可行性,其中的算法、k值理想组合可能适合类似系统。建议对其他类型的复制重排系统进行类似评估。
英文摘要:Animal mitochondrial genome regions experienced tandem duplication and the following random loss are often hypervariable and hence challenging for alignment algorithms. In theory, alignment-free comparison methods (AFM) can be used to summarize and visually present the relationships and similarities of such sequences. To our knowledge, relevant evaluations and applications are lacking. We evaluated 3 types of commonly used k-mer-based AFM with a system of intraspecific sequence variation for one such region around the origin of light strand replication. From the frog species Quasipaa boulengeri, 19 sequences ranging from 583 bp to 695 bp were clustered using 28 AFM. For each method, substrings of length k=4, 6, 8, 10, 12, 14, 16, 18, and 20 bp were tried. From the same individuals, the mitochondrial protein-coding sequences with length of 1 518 bp were used to reconstruct a Maximum Likelihood tree as the reference topology. Between the reference and AFM topologies, the Robinson-Foulds distance was calculated and the major topological difference was recorded. Using a k value of typically 8, half of the methods produced a tree different from the reference by only 2 nodes (11.8%). However, poor performances were constantly observed for some methods. A small k value of 4 was found to be suitable for inferring the relationships among sequence groups. These findings support a successful application of AFM on animal mitochondrial tandem duplication regions. The combinations between methods and k values with ideal performance obtained here may be applied to similar systems. For different systems, similar evaluations will be helpful.
2018,37(3): 261-267 收稿日期:2018-01-18
DOI:10.11984/j.issn.1000-7083.20180022
分类号:Q959.5
基金项目:国家自然科学基金项目(31372181,31572243)
作者简介:曹跃(1991-),女,硕士研究生,研究方向:两栖爬行动物系统与进化,E-mail:caoyue@cib.ac.cn
*通讯作者:郑渝池,E-mail:zhengyc@cib.ac.cn
参考文献:
Almeida JS. 2013. Sequence analysis by iterated maps, a review[J]. Briefings in Bioinformatics, 15(3):369-375.
Bernard G, Chan CX, Ragan MA. 2016. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer[J]. Scientific Reports, 6:28970.
Bonham-Carter O, Steele J, Bastola D. 2013. Alignment-free genetic sequence comparisons:a review of recent approaches by word analysis[J]. Briefings in Bioinformatics, 15(6):890-905.
Chan CX, Bernard G, Poirion O, et al. 2014. Inferring phylogenies of evolving sequences without multiple sequence alignment[J]. Scientific Reports, 4:6504.
Felsenstein J. 1989. PHYLIP-phylogeny inference package (version 3.2)[J]. Cladistics, 5(2):164-166.
Fonseca MM, Harris DJ. 2008. Relationship between mitochondrial gene rearrangements and stability of the origin of light strand replication[J]. Genetics and Molecular Biology, 31(2):566-574.
Haubold B. 2013.Alignment-free phylogenetics and population genetics[J]. Briefings in Bioinformatics, 15(3):407-418.
Hide W, Burke J, Da Vison DB. 1994. Biological evaluation of d2, an algorithm for high-performance sequence comparison[J]. Journal of Computational Biology, 1(3):199-215.
H hl M, Ragan MA. 2007. Is multiple-sequence alignment required for accurate inference of phylogeny?[J]. Systematic Biology, 56(2):206-221.
H hl M, Rigoutsos I, Ragan MA. 2006. Pattern-based phylogenetic distance rstimation and tree reconstruction[J]. Evolutionary Bioinformatics Online, 2(1):359-375.
Jiang B, Song K, Ren J, et al. 2012. Comparison of metagenomic samples using sequence signatures[J]. BMC Genomics, 13:730.
Jun SR, Sims GE, Wu GA, et al. 2010. Whole-proteome phylogeny of prokaryotes by feature frequency profiles:an alignment-free method with optimal feature resolution[J]. Proceedings of the National Academy of Sciences, 107(1):133-138.
Lanfear R, Frandsen PB, Wright AM, et al. 2016. PartitionFinder 2:new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses[J]. Molecular Biology and Evolution, 34(3):772-773.
Larkin MA, Blackshields G, Brown NP, et al. 2007. Clustal W and Clustal X version 2.0[J]. Bioinformatics, 23(21):2947-2948.
Lu YY, Tang K, Ren J, et al. 2017. CAFE:aCcelerated Alignment-FrEe sequence analysis[J]. Nucleic Acids Research, 45:W554-W559.
Qi J, Luo H, Hao B. 2004. CVTree:a phylogenetic tree reconstruction tool based on whole genomes[J]. Nucleic Acids Research, 32:W45-W47.
Ren J, Song K, Deng M, et al. 2016. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics[J]. Bioinformatics, 32(7):993-1000.
Robinson DF, Foulds LR. 1981. Comparison of phylogenetic trees[J]. Mathematical Biosciences, 53(1-2):131-147.
San Mauro D, Gower DJ, Zardoya R, et al. 2006. A hotspot of gene order rearrangement by tandem duplication and random loss in the vertebrate mitochondrial genome[J]. Molecular Biology and Evolution, 23(1):227-234.
Sims GE, Jun SR, Wu GA, et al. 2009a. Whole-genome phylogeny of mammals:evolutionary information in genic and nongenic regions[J]. Proceedings of the National Academy of Sciences, 106(40):17077-17082.
Sims GE, Jun SR, Wu GA, et al. 2009b. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions[J]. Proceedings of the National Academy of Sciences, 106(8):2677-2682.
Song K, Ren J, Reinert G, et al. 2013. New developments of alignment-free sequence comparison:measures, statistics and next-generation sequencing[J]. Briefings in Bioinformatics, 15(3):343-353.
Stamatakis A. 2014. RAxML version 8:a tool for phylogenetic analysis and post-analysis of large phylogenies[J]. Bioinformatics, 30(9):1312-1313.
Sukumaran J, Holder MT. 2010. DendroPy:a Python library for phylogenetic computing[J]. Bioinformatics, 26(12):1569-1571.
Ulitsky I, Burstein D, Tuller T, et al. 2006. The average common substring approach to phylogenomic reconstruction[J]. Journal of Computational Biology, 13(2):336-350.
Vinga S, Almeida J. 2003. Alignment-free sequence comparison-a review[J]. Bioinformatics, 19(4):513-523.
Vinga S. 2013.Information theory applications for biological sequence analysis[J]. Briefings in Bioinformatics, 15(3):376-389.
Vinga S. 2014. Alignment-free methods in computational biology[J]. Briefings in Bioinformatics, 15(3):341-342.
Wang Y, Liu L, Chen L, et al. 2014. Comparison of metatranscriptomic samples based on k-tuple frequencies[J]. PLoS ONE, 9(1):e84348. DOI:10.1371/journal.pone.0084348.
Wu TJ, Huang YH, Li LA. 2005. Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences[J]. Bioinformatics, 21(22):4125-4132.
Xia Y, Zheng Y, Murphy RW, et al. 2016. Intraspecific rearrangement of mitochondrial genome suggests the prevalence of the tandem duplication-random loss (TDLR) mechanism in Quasipaa boulengeri[J]. BMC Genomics, 17:965.
Yi H, Jin L. 2013. Co-phylog:an assembly-free phylogenomic approach for closely related organisms[J]. Nucleic Acids Research, 41(7):e75. DOI:10.1093/nar/gkt003.
Zielezinski A, Vinga S, Almeida J, et al. 2017. Alignment-free sequence comparison:benefits, applications, and tools[J]. Genome Biology, 18(1):186.
读者评论

      读者ID: 密码:   
我要评论:
国内统一连续出版物号:51-1193/Q |国际标准出版物号:1000-7083
主管单位:四川省科学技术协会  主办单位:四川省动物学会/成都大熊猫繁育研究基金会/四川省野生动植物保护协会/四川大学
开户银行:中国工商银行四川分行营业部东大支行(工行成都东大支行营业室)  帐户名:四川省动物学会  帐号:4402 2980 0900 0012 596
版权所有©2018四川动物》编辑部 蜀ICP备08107403号-3
您是本站第6430727名访问者

川公网安备 51010702000173号