Java高级特性系列(一):对象序列化

【引言】简单描述呢,序列化就是把对象转成字节序列(通常用于对象持久化或克隆),在Java中,只要需要序列化的对象类型实现了Serializbale接口,就算完成了对序列化的支持,但是序列化具体细节是如何实现的呢?


序列化和反序列化

  在Java语言中,提供了序列化和反序列化机制,所谓序列化就是将对象转换为字节序列的过程(可以理解为做了一个对象快照),反序列化很显而易见就是将字节序列再转换为对象的过程。在序列化的过程中形成的字节序列,包括了该对象的数据内容、对象的类型信息和存储在对象中数据的类型信息,而这些都是保证可以正确的反序列化的关键所在。
  序列化和反序列化的过程都是平台无关的,也就是说只要JVM版本兼容,在任何平台都可以支持这种序列化和反序列化的操作。

为什么要使用序列化?

  第一个场景:我们需要将Java程序运行过程中的某些对象持久化保存(做备份或者转储等等),这时候一般都是通过序列化操作将字节序列直接存到磁盘的某个文件中。
  第二个场景:满足Java对象在网络传输的过程中的需要,因为网络中只能以二进制形式传输数据;比如之前做过一个比较low的爬虫,在境外爬取数据生成Java对象后,序列化成文件保存到磁盘,然后建立一个rsync同步到本地再反序列化导入本地系统。

如何完成序列化?

序列化的关键类

  序列化操作,涉及两个关键的类:ObjectOutputStream和ObjectInputStream;通过类名我们就很容易分辨出来,这两个类一个是输出流一个是输入流。以下两段是java的io包中的源码注释。
  An ObjectOutputStream writes primitive data types and graphs of Java objects to an OutputStream. The objects can be read (reconstituted) using an ObjectInputStream. Persistent storage of objects can be accomplished by using a file for the stream. If the stream is a network socket stream, the objects can be reconstituted on another host or in another process.

序列化深克隆实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
* 通过序列化Clone对象(深克隆)
* - 如果不需要深克隆或类内部不包含非基本数据类型属性的情况可以通过实现Cloneable接口即可完成clone
*
* @param <T> 泛型
* @param t 需要被clone的对象
* @return 复制后的新对象
* @throws Exception IO异常
*/
public static <T> T clone(T t) throws Exception {
ObjectOutputStream out = null;
ObjectInputStream in = null;
try {
// 通过输出流将t进行序列化
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
out = new ObjectOutputStream(byteOut);
out.writeObject(t);

// 再通过输入流,将t序列化后的byteOut流读入然后转化成返回对象
ByteArrayInputStream byteIn = new ByteArrayInputStream(byteOut.toByteArray());
in = new ObjectInputStream(byteIn);
return (T) in.readObject();
} finally {
IOUtils.closeQuietly(out);
IOUtils.closeQuietly(in);
}
}

序列化流程分析

这里贴出的源码仅供观赏性阅读,不做深入分析

ObjectOutputStream

ObjectOutputStream构造方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public ObjectOutputStream(OutputStream out) throws IOException {
verifySubclass();
bout = new BlockDataOutputStream(out);
handles = new HandleTable(10, (float) 3.00);
subs = new ReplaceTable(10, (float) 3.00);
enableOverride = false;
writeStreamHeader();
bout.setBlockDataMode(true);
if (extendedDebugInfo) {
debugInfoStack = new DebugTraceInfoStack();
} else {
debugInfoStack = null;
}
}

ObjectOutputStream的构造方法源码解读

  构造函数中首先会把bout对象绑定到底层的字节数据容器,接着会调用writeStreamHeader()方法

1
2
3
4
5
6
7
8
9
10
11
// bout = new BlockDataOutputStream(out);
BlockDataOutputStream(OutputStream out) {
this.out = out;
dout = new DataOutputStream(this);
}

// writeStreamHeader();
protected void writeStreamHeader() throws IOException {
bout.writeShort(STREAM_MAGIC);
bout.writeShort(STREAM_VERSION);
}

  在writeStreamHeader()方法中首先会往底层字节容器中写入表示序列化的Magic Number以及版本号,定义如下

1
2
3
4
5
6
7
8
9
/**
* Magic number that is written to the stream header.
*/
final static short STREAM_MAGIC = (short)0xaced;

/**
* Version number that is written to the stream header.
*/
final static short STREAM_VERSION = 5;

  接下来会调用writeObject()方法进行序列化,具体实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public final void writeObject(Object obj) throws IOException {
if (enableOverride) {
writeObjectOverride(obj);
return;
}
try {
// 实际序列化的write方法
writeObject0(obj, false);
} catch (IOException ex) {
if (depth == 0) {
writeFatalException(ex);
}
throw ex;
}
}

  writeObject内部调用了writeObject0,它内部实现了很多不同的write方法的调用,具体的细节这里就不深入了,通过一系列的write,将对象完成序列化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
private void writeObject0(Object obj, boolean unshared)
throws IOException
{
boolean oldMode = bout.setBlockDataMode(false);
depth++;
try {
// handle previously written and non-replaceable objects
int h;
if ((obj = subs.lookup(obj)) == null) {
writeNull();
return;
} else if (!unshared && (h = handles.lookup(obj)) != -1) {
writeHandle(h);
return;
} else if (obj instanceof Class) {
writeClass((Class) obj, unshared);
return;
} else if (obj instanceof ObjectStreamClass) {
writeClassDesc((ObjectStreamClass) obj, unshared);
return;
}

// check for replacement object
Object orig = obj;
Class<?> cl = obj.getClass();
ObjectStreamClass desc;
for (;;) {
// REMIND: skip this check for strings/arrays?
Class<?> repCl;
desc = ObjectStreamClass.lookup(cl, true);
if (!desc.hasWriteReplaceMethod() ||
(obj = desc.invokeWriteReplace(obj)) == null ||
(repCl = obj.getClass()) == cl)
{
break;
}
cl = repCl;
}
if (enableReplace) {
Object rep = replaceObject(obj);
if (rep != obj && rep != null) {
cl = rep.getClass();
desc = ObjectStreamClass.lookup(cl, true);
}
obj = rep;
}

// if object replaced, run through original checks a second time
if (obj != orig) {
subs.assign(orig, obj);
if (obj == null) {
writeNull();
return;
} else if (!unshared && (h = handles.lookup(obj)) != -1) {
writeHandle(h);
return;
} else if (obj instanceof Class) {
writeClass((Class) obj, unshared);
return;
} else if (obj instanceof ObjectStreamClass) {
writeClassDesc((ObjectStreamClass) obj, unshared);
return;
}
}

// remaining cases
if (obj instanceof String) {
writeString((String) obj, unshared);
} else if (cl.isArray()) {
writeArray(obj, desc, unshared);
} else if (obj instanceof Enum) {
writeEnum((Enum<?>) obj, desc, unshared);
} else if (obj instanceof Serializable) {
writeOrdinaryObject(obj, desc, unshared);
} else {
if (extendedDebugInfo) {
throw new NotSerializableException(
cl.getName() + "\n" + debugInfoStack.toString());
} else {
throw new NotSerializableException(cl.getName());
}
}
} finally {
depth--;
bout.setBlockDataMode(oldMode);
}
}

ObjectInputStream

ObjectInputStream的构造方法源码解读

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/**
* Creates an ObjectInputStream that reads from the specified InputStream.
* A serialization stream header is read from the stream and verified.
* This constructor will block until the corresponding ObjectOutputStream
* has written and flushed the header.
*
* <p>If a security manager is installed, this constructor will check for
* the "enableSubclassImplementation" SerializablePermission when invoked
* directly or indirectly by the constructor of a subclass which overrides
* the ObjectInputStream.readFields or ObjectInputStream.readUnshared
* methods.
*
* @param in input stream to read from
* @throws StreamCorruptedException if the stream header is incorrect
* @throws IOException if an I/O error occurs while reading stream header
* @throws SecurityException if untrusted subclass illegally overrides
* security-sensitive methods
* @throws NullPointerException if <code>in</code> is <code>null</code>
* @see ObjectInputStream#ObjectInputStream()
* @see ObjectInputStream#readFields()
* @see ObjectOutputStream#ObjectOutputStream(OutputStream)
*/
public ObjectInputStream(InputStream in) throws IOException {
verifySubclass();
bin = new BlockDataInputStream(in);
handles = new HandleTable(10);
vlist = new ValidationList();
serialFilter = ObjectInputFilter.Config.getSerialFilter();
enableOverride = false;
readStreamHeader();
bin.setBlockDataMode(true);
}

  首先会调用readObject()方法进行反序列化,实际整个流程的逻辑和ObjectOutputStream是类似的,具体实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public final Object readObject()
throws IOException, ClassNotFoundException
{
if (enableOverride) {
return readObjectOverride();
}

// if nested read, passHandle contains handle of enclosing object
int outerHandle = passHandle;
try {
Object obj = readObject0(false);
handles.markDependency(outerHandle, passHandle);
ClassNotFoundException ex = handles.lookupException(passHandle);
if (ex != null) {
throw ex;
}
if (depth == 0) {
vlist.doCallbacks();
}
return obj;
} finally {
passHandle = outerHandle;
if (closed && depth == 0) {
clear();
}
}
}

  再接下来就是核心的一个方法readObject0,它内部提供了不同的read方法调用,最终将序列化后的文件给反序列化为一个对象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
/**
* Underlying readObject implementation.
*/
private Object readObject0(boolean unshared) throws IOException {
boolean oldMode = bin.getBlockDataMode();
if (oldMode) {
int remain = bin.currentBlockRemaining();
if (remain > 0) {
throw new OptionalDataException(remain);
} else if (defaultDataEnd) {
/*
* Fix for 4360508: stream is currently at the end of a field
* value block written via default serialization; since there
* is no terminating TC_ENDBLOCKDATA tag, simulate
* end-of-custom-data behavior explicitly.
*/
throw new OptionalDataException(true);
}
bin.setBlockDataMode(false);
}

byte tc;
while ((tc = bin.peekByte()) == TC_RESET) {
bin.readByte();
handleReset();
}

depth++;
totalObjectRefs++;
try {
switch (tc) {
case TC_NULL:
return readNull();

case TC_REFERENCE:
return readHandle(unshared);

case TC_CLASS:
return readClass(unshared);

case TC_CLASSDESC:
case TC_PROXYCLASSDESC:
return readClassDesc(unshared);

case TC_STRING:
case TC_LONGSTRING:
return checkResolve(readString(unshared));

case TC_ARRAY:
return checkResolve(readArray(unshared));

case TC_ENUM:
return checkResolve(readEnum(unshared));

case TC_OBJECT:
return checkResolve(readOrdinaryObject(unshared));

case TC_EXCEPTION:
IOException ex = readFatalException();
throw new WriteAbortedException("writing aborted", ex);

case TC_BLOCKDATA:
case TC_BLOCKDATALONG:
if (oldMode) {
bin.setBlockDataMode(true);
bin.peek(); // force header read
throw new OptionalDataException(
bin.currentBlockRemaining());
} else {
throw new StreamCorruptedException(
"unexpected block data");
}

case TC_ENDBLOCKDATA:
if (oldMode) {
throw new OptionalDataException(true);
} else {
throw new StreamCorruptedException(
"unexpected end of block data");
}

default:
throw new StreamCorruptedException(
String.format("invalid type code: %02X", tc));
}
} finally {
depth--;
bin.setBlockDataMode(oldMode);
}
}

补充一些概念

关于transient

  在实际开发过程中,我们常常会遇到这样的问题,这个类的有些属性需要序列化,而其他属性不需要被序列化,打个比方,如果一个用户有一些敏感信息(如密码,银行卡号等),为了安全起见,不希望在网络操作(主要涉及到序列化操作,本地序列化缓存也适用)中被传输,这些信息对应的变量就可以加上transient关键字。换句话说,这个字段的生命周期仅存于调用者的内存中而不会写到磁盘里持久化。java 的transient关键字为我们提供了便利,你只需要实现Serilizable接口,将不需要序列化的属性前添加关键字transient,序列化对象的时候,这个属性就不会序列化到指定的目的地中。

1
2
3
4
5
6
7
8
public class UserSerialize implements Serializable{
private static final long serialVersionUID = 1L;
private String userId;
private String userName;

private transient String sex;
......
}

关于SerialVersionUID

  简单来说,Java的序列化机制是通过在运行时判断类的serialVersionUID来验证版本一致性的。在进行反序列化时,JVM会把传来的字节流中的serialVersionUID与本地相应实体(类)的serialVersionUID进行比较,如果相同就认为是一致的,可以进行反序列化,否则就会出现序列化版本不一致的异常。(InvalidCastException)

关于Externalizable

  它是Serializable接口的子类,当我们不希望序列化那么多内容的时候,可以选择使用这个接口,通过这个接口的writeExternal()和readExternal()方法可以指定序列化哪些属性;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
public interface Externalizable extends java.io.Serializable {
/**
* The object implements the writeExternal method to save its contents
* by calling the methods of DataOutput for its primitive values or
* calling the writeObject method of ObjectOutput for objects, strings,
* and arrays.
*
* @serialData Overriding methods should use this tag to describe
* the data layout of this Externalizable object.
* List the sequence of element types and, if possible,
* relate the element to a public/protected field and/or
* method of this Externalizable class.
*
* @param out the stream to write the object to
* @exception IOException Includes any I/O exceptions that may occur
*/
void writeExternal(ObjectOutput out) throws IOException;

/**
* The object implements the readExternal method to restore its
* contents by calling the methods of DataInput for primitive
* types and readObject for objects, strings and arrays. The
* readExternal method must read the values in the same sequence
* and with the same types as were written by writeExternal.
*
* @param in the stream to read data from in order to restore the object
* @exception IOException if I/O errors occur
* @exception ClassNotFoundException If the class for an object being
* restored cannot be found.
*/
void readExternal(ObjectInput in) throws IOException, ClassNotFoundException;
}

  Externalizable应用示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
public class UserSerialize implements Externalizable{
/**
*
*/
private static final long serialVersionUID = 1L;
private String userId;
private String userName;

public String getUserId() {
return userId;
}

public void setUserId(String userId) {
this.userId = userId;
}

public String getUserName() {
return userName;
}

public void setUserName(String userName) {
this.userName = userName;
}

@Override
public void writeExternal(ObjectOutput objectoutput) throws IOException {
objectoutput.writeObject(userId);

}

@Override
public void readExternal(ObjectInput objectinput) throws IOException,
ClassNotFoundException {
userId = (String) objectinput.readObject();

}

}

简单的总结

  • 重点:待序列化的类必须实现 java.io.Serializable对象
  • 重点:待序列化的类的所有属性必须是可序列化的。如果有属性不是需要序列化的,则该属性必须注明是临时的(transient)
  • 重点:序列化时,只对对象的状态进行保存,实际是不管对象的方法的
  • 重点:当一个父类实现序列化,子类自动实现序列化,不需要显式实现Serializable接口
  • 重点:强烈建议在一个可序列化类中显示的定义serialVersionUID,为它赋予明确的值
  • 重点:静态变量不管是否被transient修饰,均不能被序列化
------2019 Lin.C ------