This paper aims to improve the speed and complexity of Smith-Waterman (SW) algorithm. For this purpose, the SW algorithm was improved by reducing the complexity and task load of the computation of the scoring matrix without sacrificing the alignment accuracy. Then, the optimized algorithm, denoted as the Opti-SW, was verified through experiment. The results show that the Opti-SW boasts low time complexity, fast computing speed and light computing load. The research findings shed new light on the database search for gene sequences.
gene sequence alignment, smith-waterman (SW) algorithm, optimization, opti-SW
The work is funded in part by the National Natural Science Foundation of China (NSFC) under Grant No. 61462070, the Doctoral research fund project of Inner Mongolia Agricultural University under Grant No. BJ09-44 and the Inner Mongolia Autonomous Region Key Laboratory of big data research and application for agriculture and animal husbandry.
Balaji P., Feng W., Archuleta J., Lin H. (2008). Semantics-based distributed I/O for mpiBLAST. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 293-294. https://doi.org/10.1145/1345206.1345262
Benkrid K., Liu Y., Benkrid A. S. (2009). A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Transactions on Very Large Scale Integration Systems, Vol. 17, No. 4, pp. 561-570. http://dx.doi.org/10.1109/TVLSI.2008.2005314
Cao L., Xu Y., Deng C. (2015). Algorithm for DNA double sequence alignment problem. Application of Computer System, Vol. 24, No. 9, pp. 112-117.
Chen M. (2016). Bioinformatics. Beijing: The Science Publishing Company, pp. 13-18.
Daily J. (2016). Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics, Vol. 17, pp. 124-129. https://doi.org/10.1186/s12859-016-0930-z
Farrar M. (2007). Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, Vol. 23, pp. 156-161. https://doi.org/10.1093/bioinformatics/btl582
Feng B. (2015). Distributed parallel optimization of needleman-wunsch algorithm for double sequence alignment. Hohhot: Inner Mongolia Agricultural University, pp. 38-41.
Feng B., Gao J. (2016). Distributed parallel Needleman-Wunsch algorithm on heterogeneous cluster system. International Conference on Network and Information Systems for Computers.IEEE, pp. 358-361. https://doi.org/10.1109/ICNISC.2015.145
Gao J., Jiao Y., Zhang W. G. (2014). Review of high throughput sequencing sequence alignment. Life Science Research, Vol. 18, No. 5, pp. 458-464.
Jain C., Kumar S. (2014). Fine-grained GPU parallelization of pairwise local sequence alignment. International Conference on High Performance Computing. IEEE Computer Society, pp. 1-10. http://dx.doi.org/10.1109/HiPC.2014.7116912
Li D. W. (2011). Research of Parallel algorithm for sequence alignment based on dynamic programming. Journal of Jinggangshan University (Natural Science Edition), Vol. 32, No. 3, pp. 80-84.
Liu Y., Wang X., Li J., Mao Y., Zhao D. (2012). Research progress of local sequence alignment algorithm and parallel acceleration. Military Medicine, Vol. 36, No. 7, pp. 556-560. Http://dx.chinadoi.cn/10.3969/j.issn.1674-9960.2012.07.018
Liu Y., Wirawan A., Schmidt B. (2013). CUDASW++3.0: Accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, Vol. 14, pp. 117. https://doi.org/10.1186/1471-2105-14-117
Wang C., Li X., Chen P., Wang A., Zhou X., Yu H. (2015). Heterogeneouscloud framework for big data genome sequencing. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 12, pp. 166-178. https://doi.org/10.1109/TCBB.2014.2351800
Wei G., Ma C., Pei S., Wu B. (2009). The accelerating implementation of BLAST with stream processor. IEEE, International Conference on Computer-Aided Industrial Design & Conceptual Design, Caid & Cd. IEEE, pp. 2245-2250. http://dx.doi.org/10.1109/CAIDCD.2009.5375228
Xu B., Li C., Zhuang H., Wang J., Wang Q., Zhou X. (2017). Efficient distributed Smith-Waterman algorithm based on apache spark. IEEE International Conference on Cloud Computing, pp. 608-615. http://dx.doi.org/10.1109/CLOUD.2017.83
Xue Q. F. (2015). Research and application of parallel algorithm for DNA sequence alignment. Shanghai: Shanghai University, pp. 7-11.
Zhang D. Y. (2015). Bioinformatics. Beijing: The Science Publishing Company, pp. 28-43.
Zhao M., Lee W. P., Garrison E. P., Marth G. T. (2013). SSW library: An SIMD Smith-Waterman C/C++ library for use in genomic applications. Plo S one, Vol. 8, pp. 2138-2142. https://doi.org/10.1371/journal.pone.0082138