本文共 1628 字,大约阅读时间需要 5 分钟。
Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].Java代码:
package com.leetcode;import java.util.ArrayList;import java.util.HashMap;import java.util.HashSet;import java.util.List;import java.util.Map;import java.util.Set;public class Test_73 { private static final MapA = new HashMap<>(); static { A.put('A',0); A.put('C',1); A.put('G',2); A.put('T',3); } private final static int A_SIZE_POW_9 = (int) Math.pow(A.size(), 9); //注意到DNA中只有四种符号,可做编码操作 /* * 例如:AAACCC 编码之后为000111 将四进制数转化为十进制数:4^0*1 + 4^1*1 +4^2*1=21 * */ public static List findRepeatedDnaSequences(String s) { Set res = new HashSet<>(); Set hashes = new HashSet<>(); for (int i = 0, rhash = 0; i < s.length(); i++) { if (i > 9) rhash -= A_SIZE_POW_9 * A.get(s.charAt(i-10)); rhash = A.size() * rhash + A.get(s.charAt(i)); if (i > 8 && !hashes.add(rhash)) res.add(s.substring(i-9,i+1)); } return new ArrayList<>(res); } public static void main(String[] arqs) { String test = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"; List result = findRepeatedDnaSequences(test); for (String s : result) { System.out.println(s); } }}
转载地址:http://vpuni.baihongyu.com/