1. contig N50 的定义
2. 脚本实现
1. N50 计算:
# 以sequence.fasta 为列,来计算N50的值 # 基因组拼接时,contig是按照长度从大到小排列的 f = open('sequence.fasta', 'r').readlines() total_length = 0 for i in f: if not i.startswith('>'): seq_length = len(i.strip()) total_length += seq_length variable_length = 0 for i in f: if not i.startswith('>'): seq_length = len(i.strip()) variable_length += seq_length if variable_length >= total_length / 2: print('N50 : %s'%seq_length) #输出N50的值 break
2. GC 含量计算
# 以sequence.fasta 为列,来计算N50的值 f = open('sequence.fasta', 'r').readlines() total_length = 0 total_seq = '' for i in f: if not i.startswith('>'): seq_length = len(i.strip()) total_length += seq_length total_seq += i.strip() GC_number = 0 for j in total_seq: if j == 'C' or j == 'G': GC_number += 1 GC_percent = GC_number / total_length # 输出为GC含量 print(f'GC percent : {
GC_percent}')
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/219816.html原文链接:https://javaforall.net
